
Kimia Ghobadi
· John C. Malone Assistant ProfessorJohns Hopkins University · Civil Engineering
Active 2011–2026
About
Kimia Ghobadi is a John C. Malone Assistant Professor in the Department of Civil and Systems Engineering at Johns Hopkins University. Her research focuses on using mathematical models, optimization techniques, and data analytics to solve problems in complex systems, particularly in healthcare systems and medical decision-making environments. She develops models and solution techniques in inverse optimization, mixed-integer programming, and online algorithms. Her current projects include inverse optimization models for personalized diet, radiation therapy treatment planning for cancer patients, health systems capacity management and resource allocation, risk-assessment tools for falls in frail and elderly patients, timeseries predicting models and large language models in patient digital twins, COVID-19 simulation and impact on disparities, and scheduling and process efficiency in hospitals and home-cares.
Research topics
- Computer Science
- Medicine
- Operations research
- Political Science
- Computer Security
- Engineering
- Emergency medicine
- Bioinformatics
- Operations management
- Internal medicine
- Medical emergency
- Data science
- Management science
- Intensive care medicine
- Virology
- Risk analysis (engineering)
- Business
Selected publications
An interpretable data-driven approach to optimizing clinical fall risk assessment
arXiv (Cornell University) · 2026-01-08
preprintOpen accessSenior authorIn this study, we aim to better align fall risk prediction from the Johns Hopkins Fall Risk Assessment Tool (JHFRAT) with additional clinically meaningful measures via a data-driven modelling approach. We conducted a retrospective cohort analysis of 54,209 inpatient admissions from three Johns Hopkins Health System hospitals between March 2022 and October 2023. A total of 20,208 admissions were included as high fall risk encounters, and 13,941 were included as low fall risk encounters. To incorporate clinical knowledge and maintain interpretability, we employed constrained score optimization (CSO) models to reweight the JHFRAT scoring weights, while preserving its additive structure and clinical thresholds. Recalibration refers to adjusting item weights so that the resulting score can order encounters more consistently by the study's risk labels, and without changing the tool's form factor or deployment workflow. The model demonstrated significant improvements in predictive performance over the current JHFRAT (CSO AUC-ROC=0.91, JHFRAT AUC-ROC=0.86). This performance improvement translates to protecting an additional 35 high-risk patients per week across the Johns Hopkins Health System. The constrained score optimization models performed similarly with and without the EHR variables. Although the benchmark black-box model (XGBoost), improves upon the performance metrics of the knowledge-based constrained logistic regression (AUC-ROC=0.94), the CSO demonstrates more robustness to variations in risk labeling. This evidence-based approach provides a robust foundation for health systems to systematically enhance inpatient fall prevention protocols and patient safety using data-driven optimization techniques, contributing to improved risk assessment and resource allocation in healthcare settings.
From Non-Identifiability to Goal-Integrated Decision-Making in Parametric Inverse Optimization
ArXiv.org · 2026-03-17
articleOpen accessSenior authorInverse optimization seeks to recover unknown objective parameters from observed decisions, yet fundamental questions about when recovery is possible have received limited formal treatment. This paper develops a comprehensive theoretical framework for inverse optimization in parametric convex models. We first establish that non-identifiability is the generic case: even with normalization and multiple observations, the parameter set compatible with data is generically multi-dimensional, and regularization does not resolve this. We derive necessary and sufficient conditions for identifiability. Motivated by these negative results, we introduce the Inverse Learning (IL) framework, which shifts the inferential target from the unknown parameter to the latent optimal solution, achieving a complexity reduction that is independent of the number of observations. IL explicitly characterizes the full set of compatible parameters rather than returning an arbitrary element. To address the tension between observational fidelity and constraint adherence, we formalize the Observation-Constraint Tradeoff and develop Goal-Integrated Inverse Learning models that enable structured navigation of this spectrum with guaranteed monotonicity. Numerical experiments demonstrate superior solution accuracy, higher parameter recovery rates, and significant computational speedups. We apply the framework to personalized dietary recommendations using NHANES data, proof-of-concept demonstrating improved glycemic control in a prospective feasibility study.
An interpretable data-driven approach to optimizing clinical fall risk assessment
ArXiv.org · 2026-01-08
articleOpen accessSenior authorIn this study, we aim to better align fall risk prediction from the Johns Hopkins Fall Risk Assessment Tool (JHFRAT) with additional clinically meaningful measures via a data-driven modelling approach. We conducted a retrospective cohort analysis of 54,209 inpatient admissions from three Johns Hopkins Health System hospitals between March 2022 and October 2023. A total of 20,208 admissions were included as high fall risk encounters, and 13,941 were included as low fall risk encounters. To incorporate clinical knowledge and maintain interpretability, we employed constrained score optimization (CSO) models to reweight the JHFRAT scoring weights, while preserving its additive structure and clinical thresholds. Recalibration refers to adjusting item weights so that the resulting score can order encounters more consistently by the study's risk labels, and without changing the tool's form factor or deployment workflow. The model demonstrated significant improvements in predictive performance over the current JHFRAT (CSO AUC-ROC=0.91, JHFRAT AUC-ROC=0.86). This performance improvement translates to protecting an additional 35 high-risk patients per week across the Johns Hopkins Health System. The constrained score optimization models performed similarly with and without the EHR variables. Although the benchmark black-box model (XGBoost), improves upon the performance metrics of the knowledge-based constrained logistic regression (AUC-ROC=0.94), the CSO demonstrates more robustness to variations in risk labeling. This evidence-based approach provides a robust foundation for health systems to systematically enhance inpatient fall prevention protocols and patient safety using data-driven optimization techniques, contributing to improved risk assessment and resource allocation in healthcare settings.
From Non-Identifiability to Goal-Integrated Decision-Making in Parametric Inverse Optimization
arXiv (Cornell University) · 2026-03-17
preprintOpen accessSenior authorInverse optimization seeks to recover unknown objective parameters from observed decisions, yet fundamental questions about when recovery is possible have received limited formal treatment. This paper develops a comprehensive theoretical framework for inverse optimization in parametric convex models. We first establish that non-identifiability is the generic case: even with normalization and multiple observations, the parameter set compatible with data is generically multi-dimensional, and regularization does not resolve this. We derive necessary and sufficient conditions for identifiability. Motivated by these negative results, we introduce the Inverse Learning (IL) framework, which shifts the inferential target from the unknown parameter to the latent optimal solution, achieving a complexity reduction that is independent of the number of observations. IL explicitly characterizes the full set of compatible parameters rather than returning an arbitrary element. To address the tension between observational fidelity and constraint adherence, we formalize the Observation-Constraint Tradeoff and develop Goal-Integrated Inverse Learning models that enable structured navigation of this spectrum with guaranteed monotonicity. Numerical experiments demonstrate superior solution accuracy, higher parameter recovery rates, and significant computational speedups. We apply the framework to personalized dietary recommendations using NHANES data, proof-of-concept demonstrating improved glycemic control in a prospective feasibility study.
Eliciting Chain-of-Thought Reasoning for Time Series Analysis using Reinforcement Learning
arXiv (Cornell University) · 2025-10-01
preprintOpen accessSenior authorComplex numerical time series analysis often demands multi-step reasoning capabilities beyond current models' reach. Tasks like medical diagnosis and weather forecasting require sequential reasoning processes - including counterfactual analysis, logical deduction, knowledge application, and multi-modal contextual integration - that existing time series models cannot explicitly perform. While recent research has shown large language models (LLMs) can achieve sophisticated Chain-of-Thought (CoT) reasoning through reinforcement learning (RL), these advances have primarily focused on mathematical and coding domains, with LLMs still demonstrating poor performance on time series tasks. We introduce Chain Of thought for Understanding Numerical Time Series (COUNTS), the first framework that trains LLMs to perform CoT reasoning across diverse time series tasks using RL with verifiable rewards. Our approach employs a Residual Vector-Quantized VAE to create high-fidelity discrete tokens that seamlessly integrate into a pre-trained LLM's vocabulary. COUNTS undergoes a two-stage training process: first, supervised fine-tuning on time series analysis tasks to master our novel representations, followed by Group Relative Policy Optimization training on verifiable problems using prompting strategies that encourage explicit reasoning steps before producing final answers. Our experiments demonstrate that this RL-driven approach with intermediate CoT reasoning significantly enhances LLM performance across various time series analysis tasks, opening new possibilities for complex temporal data reasoning.
TsLLM: Augmenting LLMs for General Time Series Understanding and Prediction
arXiv (Cornell University) · 2025-10-01
preprintOpen accessSenior authorTime series data is fundamental to decision-making across many domains including healthcare, finance, power systems, and logistics. However, analyzing this data correctly often requires incorporating unstructured contextual information, answering domain-specific questions, and generating natural language explanations - capabilities that traditional time series models lack. While Large Language Models (LLMs) excel at contextual reasoning and knowledge integration, they struggle with numerical time series due to inefficient text-based representations and limited exposure to numerical data during pretraining. We address this gap by augmenting an LLM with specialized time series perception through a patch-based encoder-decoder architecture. We train this Time Series augmented LLM (TsLLM) on a large corpus of over 25 billion tokens of interleaved time series and text spanning diverse tasks: forecasting with contextual information, question-answering, anomaly detection, classification, report generation, and more, all unified as next token prediction. This training enables TsLLM to leverage both its language understanding and newly acquired temporal reasoning capabilities. While not designed to surpass specialized models on traditional benchmarks, TsLLM demonstrates strong performance on tasks requiring the integration of time series analysis with natural language - capabilities that existing approaches cannot provide. It also exhibits strong zero-shot and few-shot performance, showing it can adapt to new data without additional training.
Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning
arXiv (Cornell University) · 2025-10-03
preprintOpen accessWe study the Non-Stationary Reinforcement Learning (RL) under distribution shifts in both finite-horizon episodic and infinite-horizon discounted Markov Decision Processes (MDPs). In the finite-horizon case, the transition functions may suddenly change at a particular episode. In the infinite-horizon setting, such changes can occur at an arbitrary time step during the agent's interaction with the environment. While the Q-learning Upper Confidence Bound algorithm (QUCB) can discover a proper policy during learning, due to the distribution shifts, this policy can exploit sub-optimal rewards after the shift happens. To address this issue, we propose Density-QUCB (DQUCB), a shift-aware Q-learning UCB algorithm, which uses a transition density function to detect distribution shifts, then leverages its likelihood to enhance the uncertainty estimation quality of Q-learning UCB, resulting in a balance between exploration and exploitation. Theoretically, we prove that our oracle DQUCB achieves a better regret guarantee than QUCB. Empirically, our DQUCB enjoys the computational efficiency of model-free RL and outperforms QUCB baselines by having a lower regret across RL tasks, as well as a COVID-19 patient hospital allocation task using a Deep-Q-learning architecture.
MedTsLLM: Medical Time Series Analysis Using Multimodal LLMs
IEEE Journal of Biomedical and Health Informatics · 2025-01-01
articleSenior authorTraditional machine learning approaches for biomedical time series analysis face fundamental limitations when integrating the heterogeneous data types essential for comprehensive clinical understanding. Physiological signals must be interpreted within rich clinical contexts that include patient history, current medications, and treatment protocols-information typically stored as unstructured text that conventional time series models cannot effectively utilize. We propose MedTsLLM, a multimodal model that aims to address this critical gap by integrating numerical physiological signals with natural language clinical information through large language models (LLMs). Our framework incorporates patch reprogramming for time series-LLM alignment and introduces two key innovations: novel covariate handling strategies that capture complex physiological relationships, and contextual prompting mechanisms that incorporate patient-specific information. MedTsLLM addresses four clinically significant tasks within a unified architecture: semantic segmentation, boundary detection, anomaly detection, and classification. Through comprehensive evaluation across diverse medical domains, including ECG analysis, respiratory monitoring, and cardiac arrhythmia detection, our approach consistently outperforms state-of-the-art baselines across all tasks and datasets. These results demonstrate the transformative potential of multimodal LLMs for biomedical signal analysis, enabling clinicians to extract deeper insights from physiological data while leveraging comprehensive clinical context to enhance diagnostic accuracy, patient monitoring, and personalized treatment decisions.
arXiv (Cornell University) · 2024-07-19
preprintOpen accessSenior authorIn many applied optimization settings, parameters that define the constraints may not guarantee the best possible solution, and superior solutions might exist that are infeasible for the given parameter values. Removing such constraints, re-optimizing, and evaluating the new solution may be insufficient, as the optimizer's preferences in selecting the existing solutions might be lost. To address this issue, we present an inverse optimization-based model that takes an observed solution as input and aims to improve upon it by projecting onto desired hyperplanes or expanding the feasible set while balancing the distance to the observed decision to preserve the optimizer's preferences. We demonstrate the applicability of the model in the context of radiation therapy treatment planning, an essential component of cancer treatment. Radiation therapy treatment planning is typically guided by expert-driven guidelines that define the optimization problem but remain mostly general. Our model provides an automated framework that learns new plans from available plans based on given clinical criteria, optimizing the desired effect without compromising the remaining constraints. The proposed approach is applied to a cohort of four prostate cancer patients, and the results demonstrate improvements in dose-volume histograms while maintaining comparable target coverage to clinically acceptable plans. By optimizing the parameters of the treatment planning problem and exploring the Pareto frontier, our methodology uncovers previously unattainable solutions that enhance organ-at-risk sparing without sacrificing target coverage. The framework's ability to handle multiple organs-at-risk and various dose-volume constraints highlights its flexibility and potential for application to diverse radiation therapy treatment planning scenarios.
Algorithms at the Bedside: Moving Past Development and Validation*
Pediatric Critical Care Medicine · 2024-03-01 · 2 citations
editorialWe need help. For seventy years, it has been known that humans can simultaneously handle only seven bits of information (1). Regardless that in 1956 a “bit” of information and in 2024 a “byte” of data are different, the number 7 has persisted through many cognitive psychology experiments. And if your mind is tied up managing a sudden noise (e.g., an alarm) and a question from a colleague, you can then handle only five more information bits. Much work has been done to understand how we “chunk” data into patterns (2), and the ability to find patterns both improves with experience and also has the same cognitive limit (3). However, more recent work shows our ability to handle chucks is limited to 3–5 (4) (and again, fewer when we are distracted). Not surprisingly, critical care physicians handle high information load by learning to “see patterns” (a.k.a. chunks) and as expertise increases, pattern recognition also increases (5). But even the most highly qualified experts have limited attention and memory availability, let alone limited knowledge given the ever-expanding scientific literature. Relying only on individual clinician’s cognition, even if they have the highest expertise possible, is quite troublesome given the amount of data each clinician is asked to handle in every ICU. Manor-Shulman et al (6) showed about 1350 data points are documented daily on patients in the PICU. About 1500 data points are documented daily on children on mechanical ventilation. Not included in these numbers are the undocumented data (e.g., waveform data, waveform interactions, collegial and family discussions), which probably take the data loads up two or three orders of magnitude. Further, a typical attending intensivist often cares for ten or more patients. To do the multiplication, that is 13,000 to 15,000 data points for each clinician every day. To pretend any of us can ingest and analyze even most of the data is just that—pretend. The promise of artificial intelligence (AI), and particularly machine learning, in critical care is its ability to ingest the documented and undocumented data including images and text and discover patterns in the data that clinicians now do not, and frankly cannot, recognize because clinicians are humans. Hence, the promise of AI in critical care is great. AIs contribution to bedside patient critical care remains absent. In this issue of Pediatric Critical Care Medicine, Chanci et al (7) use machine learning techniques to combine more than 50 parameters into a single number that “…predicts the need for intubation in children between 24 hours and up to 7 days after hospital admission.” Three questions about this (and every other predictive algorithm) are: 1) does this new data point reduce a clinician’s cognitive burden by reducing 50 numbers into one; 2) where, when, to whom, and how should this new data point be presented; and 3) when the algorithm is wrong or biased, how will the mistakes (either discovered by the clinicians, or worse, blindly accepted by the clinicians) be incorporated into algorithm improvements? Asked succinctly, does this new prediction algorithm help (question no. 1)? Answered succinctly, no. The data supporting “no” is in Supplemental Tables 4–6 in (7). For the 17,841 PICU stays, the algorithm correctly fired 921 times and correctly did not fire 8,856 times. Seven thousand two hundred eighty times it incorrectly fired and 244 times it incorrectly did not fire (which includes the 227 “late positives” when the child was intubated before the algorithm fired. Considering the positive alerts, the precision of the algorithm, defined as true positive alerts divided by all positive alerts is only 921/(921 + 7820) = 11% (also called the positive predictive value [PPV]). If you change the alarm threshold to allow the sensitivity to fall to about 50% (or the alert fires only half the time when a child needs intubation), the PPV then rises to about 50%. Neither scenario helps. Continuing to hammer the point, Supplement Table 7 (7) shows that about 88% of the alerts are false, and on day 6 of admission, 98% of the alerts are false. Thus, it is an understatement when the authors comment, “While minimization of false positive alarms is needed, these patients may warrant increased monitoring and vigilance.” Yet to follow-up on the last part of the above sentence the data presented show that 20% of the patients were intubated even before the alert fired; obviously, they were being monitored. Back to the questions, where, when, to whom, and how should this new data point be presented (question no. 2)? Asked differently, how should the algorithm output be incorporated into the workflow? Because all the parameters used in this algorithm are derived from the electronic medical record (EMR), a simple solution would seem to be to create a pop-up alert in the EMR. There are, at least, three problems with this solution that can be explained briefly by the concept of “work as imagined” vs. “work as done” from human factors engineering (8,9). First, there is often a charting delay between the time an event happens and the time it is documented. Thus, an alert might fire at 2 pm based on data that occurred at noon. Second, the alert may only be seen when a clinician signs into the EMR and in this case, the clinician may be signing in to place orders for intubation. Third, the vital sign inputs could be available with an analysis of the continuous monitor data; the monolithic EMRs do not incorporate data at high frequency (and much information is consequently lost). An answer to question no. 3 is even more complicated; when the algorithm is wrong, how will the mistakes be incorporated into algorithm improvements? It is hard to even know when an algorithm is wrong. Overlooked is often the issue of “ground truth.” If you try to predict something and that something has a nebulous (i.e., noncomputable) definition, then any prediction must be calibrated against an approximation of ground truth. Further, the use of data to predict future clinician behavior (e.g., endotracheal intubation) using data that often also depends on clinician behavior (e.g., laboratory data sent only with a specific order) will also be nebulous given the wide variabilities of clinician behaviors (10). Incorporating mistakes into algorithm improvements will require continuous tracking of ground truth and routine algorithm recalibrations. In conclusion, we as a field, need to move past “just” development and validation. For any article describing a new predictive analytic, the discussion section should have a paragraph beginning with, “And at the bedside, we believe this alert based on our in-depth human factors engineering investigations of clinical workflow and cognitive work should be…” even if the sentence is completed with “… ignored until the PPV is improved and the team-based workflow is better organized.” The following paragraph should start with, “To keep the algorithm current, we should include adaptive, continuous and active learning (among many others), seek a computable definition of ground truth … and discover the best clinical pathways” (11,12). This paragraph will be even harder to complete. Yet without both paragraphs, we will be doing remarkable math, but will neither be supporting clinicians nor improving patient care. We need help. AI can help if, and only if, it is designed and implemented considering the realities of clinical work and by using a participatory human-centered design.
Frequent coauthors
- 20 shared
Dionne M. Aleman
University of Toronto
- 17 shared
David A. Jaffray
The University of Texas MD Anderson Cancer Center
- 16 shared
Mark Ruschin
- 15 shared
Hamid R. Ghaffari
University of Toronto
- 8 shared
Felix Parker
- 6 shared
Jeremiah S. Hinson
Johns Hopkins University
- 5 shared
Lauren Gardner
Johns Hopkins University
- 5 shared
Farzin Ahmadi
Awards & honors
- INFORMS Judith Liebman Award
- Johns Hopkins Discovery Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Kimia Ghobadi
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup