
About
Madalina Fiterau is an Assistant Professor in the College of Information and Computer Sciences at UMass Amherst, where she leads the Information Fusion Lab. She completed her PhD in Machine Learning at Carnegie Mellon University and was a postdoctoral researcher at Stanford University before joining UMass. Her research focuses on hybrid models and the development of new deep learning methodologies to obtain salient representations from multimodal biomedical data, including time series, text, and images. Her work aims to advance healthcare by applying machine learning techniques to clinical data. She has been recognized with several awards, including the Marr Prize for Best Paper at ICCV 2015, the Star Research Award at the Annual Congress of the Society of Critical Care Medicine in 2016, and the Manning IALS Research Award in 2019. Madalina has also contributed to the machine learning community by co-organizing multiple editions of the NeurIPS workshop on Machine Learning in Healthcare and serving as an area chair for the Machine Learning in Healthcare Conference. Additionally, she administers the ml4health Google group. In her lab, the Information Fusion Lab, her team develops hybrid models for predicting clinical outcomes such as the onset of Alzheimer's disease and mortality in the ICU, as well as tools for analyzing, processing, and labeling medical data using expert knowledge.
Research topics
- Medicine
- Internal medicine
- Computer Science
- Machine Learning
- Biology
- Cardiology
- Genetics
- Geology
- Bioinformatics
Selected publications
Challenges in Understanding Modality Conflict in Vision-Language Models
ArXiv.org · 2025-09-02
preprintOpen accessThis paper highlights the challenge of decomposing conflict detection from conflict resolution in Vision-Language Models (VLMs) and presents potential approaches, including using a supervised metric via linear probes and group-based attention pattern analysis. We conduct a mechanistic investigation of LLaVA-OV-7B, a state-of-the-art VLM that exhibits diverse resolution behaviors when faced with conflicting multimodal inputs. Our results show that a linearly decodable conflict signal emerges in the model's intermediate layers and that attention patterns associated with conflict detection and resolution diverge at different stages of the network. These findings support the hypothesis that detection and resolution are functionally distinct mechanisms. We discuss how such decomposition enables more actionable interpretability and targeted interventions for improving model robustness in challenging multimodal settings.
Audio-Visual Speech Separation via Bottleneck Iterative Network
ArXiv.org · 2025-07-09
preprintOpen accessSenior authorIntegration of information from non-auditory cues can significantly improve the performance of speech-separation models. Often such models use deep modality-specific networks to obtain unimodal features, and risk being too costly or lightweight but lacking capacity. In this work, we present an iterative representation refinement approach called Bottleneck Iterative Network (BIN), a technique that repeatedly progresses through a lightweight fusion block, while bottlenecking fusion representations by fusion tokens. This helps improve the capacity of the model, while avoiding major increase in model size and balancing between the model performance and training cost. We test BIN on challenging noisy audio-visual speech separation tasks, and show that our approach consistently outperforms state-of-the-art benchmark models with respect to SI-SDRi on NTCD-TIMIT and LRS3+WHAM! datasets, while simultaneously achieving a reduction of more than 50% in training and GPU inference time across nearly all settings.
Journal of the American Medical Informatics Association · 2025-01-07 · 19 citations
articleOpen accessThe primary practice of healthcare artificial intelligence (AI) starts with model development, often using state-of-the-art AI, retrospectively evaluated using metrics lifted from the AI literature like AUROC and DICE score. However, good performance on these metrics may not translate to improved clinical outcomes. Instead, we argue for a better development pipeline constructed by working backward from the end goal of positively impacting clinically relevant outcomes using AI, leading to considerations of causality in model development and validation, and subsequently a better development pipeline. Healthcare AI should be "actionable," and the change in actions induced by AI should improve outcomes. Quantifying the effect of changes in actions on outcomes is causal inference. The development, evaluation, and validation of healthcare AI should therefore account for the causal effect of intervening with the AI on clinically relevant outcomes. Using a causal lens, we make recommendations for key stakeholders at various stages of the healthcare AI pipeline. Our recommendations aim to increase the positive impact of AI on clinical outcomes.
Augmenting Randomized Controlled Trials with Foundation Models as Synthetic Units
2025-10-12
articleOpen accessSenior authorRandomized Controlled Trials (RCTs) are the gold standard for evaluating treatment effects. However, they are often costly and time-consuming to conduct, requiring extended durations to yield credible estimates. In this context, in silico experiments using foundation models offer a promising proxy for unobserved outcomes. Yet, directly relying on these simulations may result in invalid inferences. This work introduces a framework that leverages black-box foundation models as digital twins to enhance the efficiency of clinical trials. Our estimator is statistically valid and guarantees performance that is at least as good as standard estimators, regardless of the quality of the foundation model. We support our theoretical claims with an application to a clinical trial on effects of donezapil.
Combining A/A Data and LLMs for Improved Content Evaluation
2025-05-08
articleOpen accessSenior authorA/B testing to evaluate user preferences and engagement is a cornerstone of the modern digital landscape. However, in the current era, the feedback cycle is considerably shortened while the experimentation space expands significantly, necessitating novel and efficient ways to assess user engagement. A/A testing, which compares identical content variants, offers a complementary approach by establishing baselines for engagement metrics and identifying natural variability in user behavior. However, A/A tests inherently lack paired samples, limiting their direct applicability to standard preference alignment methods, which require positive and negative samples for the same context. To address this gap, we propose a novel utility theory framework that enables the integration of unpaired A/A data into content evaluation systems. By translating Large Language Model (LLM) rewards into a utility framework, our approach allows for the incorporation of A/A test results into LLMs.
Resolution-Aware Retrieval Augmented Zero-Shot Forecasting
ArXiv.org · 2025-10-19
preprintOpen accessSenior authorZero-shot forecasting aims to predict outcomes for previously unseen conditions without direct historical data, posing a significant challenge for traditional forecasting methods. We introduce a Resolution-Aware Retrieval-Augmented Forecasting model that enhances predictive accuracy by leveraging spatial correlations and temporal frequency characteristics. By decomposing signals into different frequency components, our model employs resolution-aware retrieval, where lower-frequency components rely on broader spatial context, while higher-frequency components focus on local influences. This allows the model to dynamically retrieve relevant data and adapt to new locations with minimal historical context. Applied to microclimate forecasting, our model significantly outperforms traditional forecasting methods, numerical weather prediction models, and modern foundation time series models, achieving 71% lower MSE than HRRR and 34% lower MSE than Chronos on the ERA5 dataset. Our results highlight the effectiveness of retrieval-augmented and resolution-aware strategies, offering a scalable and data-efficient solution for zero-shot forecasting in microclimate modeling and beyond.
Scientific Reports · 2024-03-19 · 6 citations
articleOpen accessSenior authorCollege students experience ever-increasing levels of stress, leading to a wide range of health problems. In this context, monitoring and predicting students' stress levels is crucial and, fortunately, made possible by the growing support for data collection via mobile devices. However, predicting stress levels from mobile phone data remains a challenging task, and off-the-shelf deep learning models are inapplicable or inefficient due to data irregularity, inter-subject variability, and the "cold start problem". To overcome these challenges, we developed a platform named Branched CALM-Net that aims to predict students' stress levels through dynamic clustering in a personalized manner. This is the first platform that leverages the branching technique in a multitask setting to achieve personalization and continuous adaptation. Our method achieves state-of-the-art performance in predicting student stress from mobile sensor data collected as part of the Dartmouth StudentLife study, with a ROC AUC 37% higher and a PR AUC surpassing that of the nearest baseline models. In the cold-start online learning setting, Branched CALM-Net outperforms other models, attaining an average F1 score of 87% with just 1 week of training data for a new student, which shows it is reliable and effective at predicting stress levels from mobile data.
arXiv (Cornell University) · 2024-12-16
preprintOpen accessSenior authorA crucial step in cohort studies is to extract the required cohort from one or more study datasets. This step is time-consuming, especially when a researcher is presented with a dataset that they have not previously worked with. When the cohort has to be extracted from multiple datasets, cohort extraction can be extremely laborious. In this study, we present an approach for partially automating cohort extraction from multiple electronic health record (EHR) databases. We formulate the guided multi-dataset cohort extraction problem in which selection criteria are first converted into queries, translating them from natural language text to language that maps to database entities. Then, using FLMs, columns of interest identified from the queries are automatically matched between the study databases. Finally, the generated queries are run across all databases to extract the study cohort. We propose and evaluate an algorithm for automating column matching on two large, popular and publicly-accessible EHR databases -- MIMIC-III and eICU. Our approach achieves a high top-three accuracy of $92\%$, correctly matching $12$ out of the $13$ columns of interest, when using a small, pre-trained general purpose language model. Furthermore, this accuracy is maintained even as the search space (i.e., size of the database) increases.
A/B testing under Interference with Partial Network Information
arXiv (Cornell University) · 2024-04-16
preprintOpen accessSenior authorA/B tests are often required to be conducted on subjects that might have social connections. For e.g., experiments on social media, or medical and social interventions to control the spread of an epidemic. In such settings, the SUTVA assumption for randomized-controlled trials is violated due to network interference, or spill-over effects, as treatments to group A can potentially also affect the control group B. When the underlying social network is known exactly, prior works have demonstrated how to conduct A/B tests adequately to estimate the global average treatment effect (GATE). However, in practice, it is often impossible to obtain knowledge about the exact underlying network. In this paper, we present UNITE: a novel estimator that relax this assumption and can identify GATE while only relying on knowledge of the superset of neighbors for any subject in the graph. Through theoretical analysis and extensive experiments, we show that the proposed approach performs better in comparison to standard estimators.
Optimal fusion of genotype and drug embeddings in predicting cancer drug response
Briefings in Bioinformatics · 2024-03-27 · 3 citations
articleOpen accessPredicting cancer drug response using both genomics and drug features has shown some success compared to using genomics features alone. However, there has been limited research done on how best to combine or fuse the two types of features. Using a visible neural network with two deep learning branches for genes and drug features as the base architecture, we experimented with different fusion functions and fusion points. Our experiments show that injecting multiplicative relationships between gene and drug latent features into the original concatenation-based architecture DrugCell significantly improved the overall predictive performance and outperformed other baseline models. We also show that different fusion methods respond differently to different fusion points, indicating that the relationship between drug features and different hierarchical biological level of gene features is optimally captured using different methods. Considering both predictive performance and runtime speed, tensor product partial is the best-performing fusion function to combine late-stage representations of drug and gene features to predict cancer drug response.
Frequent coauthors
- 14 shared
Jason Fries
- 14 shared
Artur Dubrawski
- 11 shared
Gilles Clermont
University of Pittsburgh
- 10 shared
Iman Deznabi
- 9 shared
Ke Xiao
- 9 shared
Scott L. Delp
Stanford University
- 9 shared
Michael R. Pinsky
- 8 shared
Christian Dina
Centre Hospitalier Universitaire de Nantes
Labs
Computational Social Science InstitutePI
The lab focuses on computational social science research.
Education
Ph.D., Machine Learning
Carnegie Mellon University
Awards & honors
- Marr Prize for Best Paper at ICCV 2015
- Star Research Award at the Annual Congress of the Society of…
- Manning IALS Research Award in 2019
- IALS Midigrant in 2022
- Institute of Diversity Sciences Seed Grant in 2023
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Madalina Fiterau
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup