Mohsen Bayati

Verified

Stanford University · Business

Active 2005–2026

h-index31

Citations4.9k

Papers15043 last 5y

Funding$1000k

Faculty page

See your match with Mohsen Bayati — sign in to PhdFit.Sign in

About

Mohsen Bayati is the Carl and Marilynn Thoma Professor of Operations, Information & Technology at Stanford University. He also holds courtesy appointments as a Professor of Electrical Engineering in the School of Engineering and as a Professor of Radiation Oncology in the School of Medicine. His academic and research interests encompass applied machine learning in healthcare, graphical models and message-passing algorithms, and the mathematics of learning and decision-making. His work aims to improve healthcare through data-driven learning and decision models, develop mathematical models and algorithms for experimentation, learning, and personalized decision-making, and advance statistical inference methods. Professor Bayati has been recognized with several awards, including the PhD Faculty Distinguished Service Award and multiple Stanford GSB Faculty Scholar titles. He is actively involved in research that spans various domains, including healthcare analytics, network models, and algorithmic decision-making, contributing to the development of innovative solutions in these fields.

Research topics

Computer science
Mathematics
Mathematical optimization
Algorithm
Combinatorics

Selected publications

Factors affecting patients’ trust in physicians in the cases of prescribing invasive interventions among musculoskeletal patients referring to clinics in Shiraz, 2023: a discrete choice experiment
Scientific Reports · 2026-04-30
articleOpen accessSenior author
Trust is an essential component of the physician-patient relationship, influencing treatment compliance and satisfaction. Lack of trust, particularly in surgical prescriptions, can lead to more visits, lower treatment quality, and higher healthcare costs. This study aimed to identify the attributes influencing musculoskeletal patients' trust in surgeons. In the current study the discrete choice experiment (DCE) was administered to 400 musculoskeletal patients. Attributes were identified through a literature review and 28 patient interviews, and a fractional factorial design (D-efficient) produced eight choice sets. Participants completed eight binary choice tasks; data were analyzed using conditional logit choice models (Stata 17), overall and by subgroups. Reputation of the physician (OR = 3.363) compared to not being famous, male physician (OR = 1.307), performing surgery by the physician themselves (OR = 1.366) compared to performing surgery probably by a physician, appropriate communication (OR = 1.212), recommended by friends/relatives (OR = 3.137), and recommendation by other physicians (OR = 2.099) compared to not recommended, were significantly related to patient trust in physicians (P-value < 0.05). Men (OR = 3.597) have more trust in famous physicians than women (OR = 3.141). Trust is higher in all age groups, especially in those over 51 (OR = 4.197). Rural residents value a physician's reputation more (OR = 6.110) than urban residents (OR = 3.256). The results indicated that several attributes are involved in patients' trust. In general, the reputation of the physician, male physician, performing the surgery by the physician themselves, establishing proper communication with the patient, and recommendations by friends/relatives and other physicians were strongly associated with on patients' trust in the surgeon prescription.
Publisher DOI
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context
arXiv (Cornell University) · 2026-04-22
preprintOpen access
Many applications of LLM-based text regression require predicting a full conditional distribution rather than a single point value. We study distributional regression under empirical-quantile supervision, where each input is paired with multiple observed quantile outcomes, and the target distribution is represented by a dense grid of quantiles. We address two key limitations of current approaches: the lack of local grounding for distribution estimates, and the reliance on shared representations that create an indirect bottleneck between inputs and quantile outputs. In this paper, we introduce Quantile Token Regression, which, to our knowledge, is the first work to insert dedicated quantile tokens into the input sequence, enabling direct input-output pathways for each quantile through self-attention. We further augment these quantile tokens with retrieval, incorporating semantically similar neighbor instances and their empirical distributions to ground predictions with local evidence from similar instances. We also provide the first theoretical analysis of loss functions for quantile regression, clarifying which distributional objectives each optimizes. Experiments on the Inside Airbnb and StackSample benchmark datasets with LLMs ranging from 1.7B to 14B parameters show that quantile tokens with neighbors consistently outperform baselines (~4 points lower MAPE and 2x narrower prediction intervals), with especially large gains on smaller and more challenging datasets where quantile tokens produce substantially sharper and more accurate distributions.
Publisher DOI
Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context
ArXiv.org · 2026-04-22
articleOpen access
Many applications of LLM-based text regression require predicting a full conditional distribution rather than a single point value. We study distributional regression under empirical-quantile supervision, where each input is paired with multiple observed quantile outcomes, and the target distribution is represented by a dense grid of quantiles. We address two key limitations of current approaches: the lack of local grounding for distribution estimates, and the reliance on shared representations that create an indirect bottleneck between inputs and quantile outputs. In this paper, we introduce Quantile Token Regression, which, to our knowledge, is the first work to insert dedicated quantile tokens into the input sequence, enabling direct input-output pathways for each quantile through self-attention. We further augment these quantile tokens with retrieval, incorporating semantically similar neighbor instances and their empirical distributions to ground predictions with local evidence from similar instances. We also provide the first theoretical analysis of loss functions for quantile regression, clarifying which distributional objectives each optimizes. Experiments on the Inside Airbnb and StackSample benchmark datasets with LLMs ranging from 1.7B to 14B parameters show that quantile tokens with neighbors consistently outperform baselines (~4 points lower MAPE and 2x narrower prediction intervals), with especially large gains on smaller and more challenging datasets where quantile tokens produce substantially sharper and more accurate distributions.
Publisher OA PDF
On Evolution-Based Models for Experimentation Under Interference
ArXiv.org · 2025-11-26
preprintOpen accessSenior author
Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or social systems, the interaction pathways driving these interference structures remain largely unobserved. We argue that for identifying population-level causal effects, it is not necessary to recover the exact network structure; instead, it suffices to characterize how those interactions contribute to the evolution of outcomes. Building on this principle, we study an evolution-based approach that investigates how outcomes change across observation rounds in response to interventions, hence compensating for missing network information. Using an exposure-mapping perspective, we give an axiomatic characterization of when the empirical distribution of outcomes follows a low-dimensional recursive equation, and identify minimal structural conditions under which such evolution mappings exist. We frame this as a distributional counterpart to difference-in-differences. Rather than assuming parallel paths for individual units, it exploits parallel evolution patterns across treatment scenarios to estimate counterfactual trajectories. A key insight is that treatment randomization plays a role beyond eliminating latent confounding; it induces an implicit sampling from hidden interference channels, enabling consistent learning about heterogeneous spillover effects. We highlight causal message passing as an instantiation of this method in dense networks while extending to more general interference structures, including influencer networks where a small set of units drives most spillovers. Finally, we discuss the limits of this approach, showing that strong temporal trends or endogenous interference can undermine identification.
Publisher OA PDF DOI
Qualitative Verification of Machine Learning-Based Burnout Predictors in Primary Care Physicians: An Exploratory Study
Applied Clinical Informatics · 2025-04-28 · 3 citations
articleOpen access
Electronic health record (EHR) usage measures may quantify physician activity at scale and predict practice settings with a high risk for physician burnout, but their relation to experiences is poorly understood.This study aimed to explore the EHR-related experiences and well-being of primary care physicians in comparison to EHR usage measures identified as important for predicting burnout from a machine learning model.Exploratory qualitative study with semi-structured interviews of primary care physicians and clinic managers from a large academic health system and its community physician partners. We included primary care clinics with high burnout scores, low burnout scores, or large changes in burnout scores between 2020 and 2022, relative to all primary care clinics in the health system. We conducted inductive and deductive coding of interview responses using a priori themes related to the machine learning model categories of patient load, documentation burden, messaging burden, orders, and physician distress and fulfillment.Interviews with 16 physicians and 4 clinic managers identified burdens related to three dominant themes: (1) messaging and documentation burdens are high and require more time than most physicians have available during standard working hours. (2) While EHR-related burdens are high they also provide patient-care benefits. (3) Turnover and insufficient staffing exacerbate time demands associated with patient load. Dimensions that are difficult to quantify, such as a perceived imbalance between job demands and individual resources, also contribute to burnout and were consistent across all themes.EHR-related work burden, largely quantifiable through EHR usage measures, are major source of distress among primary care physicians. Organizational recognition of this work as well as staffing and support to predict associated work burden may increase professional fulfillment and reduce burnout among primary care physicians.
Publisher OA PDF DOI
Scaling Clinician-Grade Feature Generation from Clinical Notes with Multi-Agent Language Models
ArXiv.org · 2025-08-03
preprintOpen accessSenior author
Developing accurate clinical prediction models is often bottlenecked by the difficulty of deriving meaningful structured features from unstructured EHR notes, a process that traditionally requires manual, unscalable clinical abstraction. In this study, we first established a rigorous patient-level Clinician Feature Generation (CFG) protocol, in which domain experts manually reviewed notes to define and extract nuanced features for a cohort of 147 patients with prostate cancer. As a high-fidelity ground truth, this labor-intensive process provided the blueprint for SNOW (Scalable Note-to-Outcome Workflow), a transparent multi-agent large language model (LLM) system designed to autonomously mimic the iterative reasoning and validation workflow of clinical experts. On 5-year cancer recurrence prediction, SNOW (AUC-ROC 0.767) achieved performance comparable to manual CFG (0.762) and outperformed structured baselines, clinician-guided LLM extraction, and six representational feature generation (RFG) approaches. Once configured, SNOW produced the full patient-level feature table in 12 hours with 5 hours of clinician oversight, reducing human expert effort by approximately 48-fold versus manual CFG. To test scalability where manual CFG is infeasible, we deployed SNOW on an external heart failure with preserved ejection fraction (HFpEF) cohort from MIMIC-IV (n=2,084); without task-specific tuning, SNOW generated prognostic features that outperformed baseline and RFG methods for 30-day (SNOW: 0.851) and 1-year (SNOW: 0.763) mortality prediction. These results demonstrate that a modular LLM agent-based system can scale expert-level feature generation from clinical notes, while enabling interpretable use of unstructured EHR text in outcome prediction and preserving generalizability across a variety of settings and conditions.
Publisher OA PDF DOI
Can We Validate Counterfactual Estimations in the Presence of General Network Interference?
ArXiv.org · 2025-02-03
preprintOpen accessSenior author
Randomized experiments have become a cornerstone of evidence-based decision-making in contexts ranging from online platforms to public health. However, in experimental settings with network interference, a unit's treatment can influence outcomes of other units, challenging both causal effect estimation and its validation. Classic validation approaches fail as outcomes are only observable under a single treatment scenario and exhibit complex correlation patterns due to interference. To address these challenges, we introduce a framework that facilitates the use of machine learning tools for both estimation and validation in causal inference. Central to our approach is the new distribution-preserving network bootstrap, a theoretically-grounded technique that generates multiple statistically-valid subpopulations from a single experiment's data. This amplification of experimental samples enables our second contribution: a counterfactual cross-validation procedure. This procedure adapts the principles of model validation to the unique constraints of causal settings, providing a rigorous, data-driven method for selecting and evaluating estimators. We extend recent causal message-passing developments by incorporating heterogeneous unit-level characteristics and varying local interactions, ensuring reliable finite-sample performance through non-asymptotic analysis. Additionally, we develop and publicly release a comprehensive benchmark toolbox featuring diverse experimental environments, from networks of interacting AI agents to ride-sharing applications. These environments provide known ground truth values while maintaining realistic complexities, enabling systematic evaluation of causal inference methods. Extensive testing across these environments demonstrates our method's robustness to diverse forms of network interference.
Publisher OA PDF DOI
Quantile Regression with Large Language Models for Price Prediction
2025-01-01
articleOpen access
Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression, but existing approaches primarily focus on point estimates and lack systematic comparison across different methods.We investigate probabilistic regression using LLMs for unstructured inputs, addressing challenging text-to-distribution prediction tasks such as price estimation where both nuanced text understanding and uncertainty quantification are critical.We propose a novel quantile regression approach that enables LLMs to produce full predictive distributions, improving upon traditional point estimates.Through extensive experiments across three diverse price prediction datasets, we demonstrate that a Mistral-7B model fine-tuned with quantile heads significantly outperforms traditional approaches for both point and distributional estimations, as measured by three established metrics each for prediction accuracy and distributional calibration.Our systematic comparison of LLM approaches, model architectures, training approaches, and data scaling reveals that Mistral-7B consistently outperforms encoder architectures, embedding-based methods, and few-shot learning methods.Our experiments also reveal the effectiveness of LLM-assisted label correction in achieving human-level accuracy without systematic bias.Our curated datasets are made available 1 to support future research.
Publisher OA PDF DOI
Estimating Total Effects in Bipartite Experiments with Spillovers and Partial Eligibility
ArXiv.org · 2025-11-14
preprintOpen access
We study randomized experiments in bipartite systems where only a subset of treatment-side units are eligible for assignment while all units continue to interact, generating interference. We formalize eligibility-constrained bipartite experiments and define estimands aligned with full deployment: the Primary Total Treatment Effect (PTTE) on eligible units and the Secondary Total Treatment Effect (STTE) on ineligible units. Under randomization within the eligible set, we give identification conditions and develop interference-aware ensemble estimators that combine exposure mappings, generalized propensity scores, and flexible machine learning. We further introduce a projection that links treatment- and outcome-level estimands; this mapping is exact under a Linear Additive Edges condition and enables estimation on the (typically much smaller) treatment side with deterministic aggregation to outcomes. In simulations with known ground truth across realistic exposure regimes, the proposed estimators recover PTTE and STTE with low bias and variance and reduce the bias that could arise when interference is ignored. Two field experiments illustrate practical relevance: our method corrects the direction of expected interference bias for a pre-specified metric in both studies and reverses the sign and significance of the primary decision metric in one case.
Publisher OA PDF DOI
Quantile Regression with Large Language Models for Price Prediction
ArXiv.org · 2025-06-07
preprintOpen access
Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression, but existing approaches primarily focus on point estimates and lack systematic comparison across different methods. We investigate probabilistic regression using LLMs for unstructured inputs, addressing challenging text-to-distribution prediction tasks such as price estimation where both nuanced text understanding and uncertainty quantification are critical. We propose a novel quantile regression approach that enables LLMs to produce full predictive distributions, improving upon traditional point estimates. Through extensive experiments across three diverse price prediction datasets, we demonstrate that a Mistral-7B model fine-tuned with quantile heads significantly outperforms traditional approaches for both point and distributional estimations, as measured by three established metrics each for prediction accuracy and distributional calibration. Our systematic comparison of LLM approaches, model architectures, training approaches, and data scaling reveals that Mistral-7B consistently outperforms encoder architectures, embedding-based methods, and few-shot learning methods. Our experiments also reveal the effectiveness of LLM-assisted label correction in achieving human-level accuracy without systematic bias. Our curated datasets are made available at https://github.com/vnik18/llm-price-quantile-reg/ to support future research.
Publisher OA PDF DOI

Recent grants

CAREER: Algorithms and Decision Models for Learning in Health Care Systems
NSF · $500k · 2016–2022
EAGER: Data-Driven Learning and Decision Making in Healthcare
NSF · $300k · 2014–2017
ICES: Small: Collaborative Research: Data-driven mechanisms in healthcare
NSF · $200k · 2012–2015

Frequent coauthors

Awards & honors

PhD Faculty Distinguished Service Award, Stanford GSB, 2024
Younger Family Faculty Scholar, Stanford GSB, 2020–21
Younger Family Faculty Scholar, Stanford GSB, 2019–20
Spence Faculty Scholar, Stanford GSB, 2015–16

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Mohsen Bayati

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you