Carlos Guestrin

· Machine Learning, Explainability, Fairness and ML Systems Verified

Stanford University · Learning, Design, and Technology

Active 2000–2026

h-index86

Citations58.5k

Papers27236 last 5y

Funding$1.6M

Faculty page Lab page Website

See your match with Carlos Guestrin — sign in to PhdFit.Sign in

About

Carlos Guestrin is a Professor of Computer Science at Stanford University and serves as the Director of the Stanford AI Lab (SAIL). He holds the title of Fortinet Founders Professor and is a Senior Fellow at the Stanford Institute for Human-Centered AI (HAI). Guestrin is also the Chief Scientist of Visual Layer and Virtue AI. He is a member of the National Academy of Engineering. His research focuses on machine learning methods, with particular emphasis on explainability, fairness, and ethics of AI, as well as the development of machine learning systems.

Research topics

Computer Science
Business
Virology
Demographic economics
Medicine
Telecommunications
Economics
Environmental health

Selected publications

ALMo: Interactive Aim-Limit-Defined, Multi-Objective System for Personalized High-Dose-Rate Brachytherapy Treatment Planning and Visualization for Cervical Cancer
arXiv (Cornell University) · 2026-02-14
articleOpen accessSenior author
In complex clinical decision-making, clinicians must often track a variety of competing metrics defined by aim (ideal) and limit (strict) thresholds. Sifting through these high-dimensional tradeoffs to infer the optimal patient-specific strategy is cognitively demanding and historically prone to variability. In this paper, we address this challenge within the context of High-Dose-Rate (HDR) brachytherapy for cervical cancer, where planning requires strictly managing radiation hot spots while balancing tumor coverage against organ sparing. We present ALMo (Aim-Limit-defined Multi-Objective system), an interactive decision support system designed to infer and operationalize clinician intent. ALMo employs a novel optimization framework that minimizes manual input through automated parameter setup and enables flexible control over toxicity risks. Crucially, the system allows clinicians to navigate the Pareto surface of dosimetric tradeoffs by directly manipulating intuitive aim and limit values. In a retrospective evaluation of 25 clinical cases, ALMo generated treatment plans that consistently met or exceeded manual planning quality, with 65% of cases demonstrating dosimetric improvements. Furthermore, the system significantly enhanced efficiency, reducing average planning time to approximately 17 minutes, compared to the conventional 30-60 minutes. While validated in brachytherapy, ALMo demonstrates a generalized framework for streamlining interaction in multi-criteria clinical decision-making.
Publisher OA PDF
Discovering Implicit Large Language Model Alignment Objectives
arXiv (Cornell University) · 2026-02-17
preprintOpen accessSenior author
Large language model (LLM) alignment relies on complex reward signals that often obscure the specific behaviors being incentivized, creating critical risks of misalignment and reward hacking. Existing interpretation methods typically rely on pre-defined rubrics, risking the omission of "unknown unknowns", or fail to identify objectives that comprehensively cover and are causal to the model behavior. To address these limitations, we introduce Obj-Disco, a framework that automatically decomposes an alignment reward signal into a sparse, weighted combination of human-interpretable natural language objectives. Our approach utilizes an iterative greedy algorithm to analyze behavioral changes across training checkpoints, identifying and validating candidate objectives that best explain the residual reward signal. Extensive evaluations across diverse tasks, model sizes, and alignment algorithms demonstrate the framework's robustness. Experiments with popular open-source reward models show that the framework consistently captures > 90% of reward behavior, a finding further corroborated by human evaluation. Additionally, a case study on alignment with an open-source reward model reveals that Obj-Disco can successfully identify latent misaligned incentives that emerge alongside intended behaviors. Our work provides a crucial tool for uncovering the implicit objectives in LLM alignment, paving the way for more transparent and safer AI development.
Publisher DOI
Reinforcement Learning via Self-Distillation
arXiv (Cornell University) · 2026-01-28
articleOpen access
Large language models are increasingly post-trained with reinforcement learning in verifiable domains such as code and math. Yet, current methods for reinforcement learning with verifiable rewards (RLVR) learn only from a scalar outcome reward per attempt, creating a severe credit-assignment bottleneck. Many verifiable environments actually provide rich textual feedback, such as runtime errors or judge evaluations, that explain why an attempt failed. We formalize this setting as reinforcement learning with rich feedback and introduce Self-Distillation Policy Optimization (SDPO), which converts tokenized feedback into a dense learning signal without any external teacher or explicit reward model. SDPO treats the current model conditioned on feedback as a self-teacher and distills its feedback-informed next-token predictions back into the policy. In this way, SDPO leverages the model's ability to retrospectively identify its own mistakes in-context. Across scientific reasoning, tool use, and competitive programming on LiveCodeBench v6, SDPO improves sample efficiency and final accuracy over strong RLVR baselines. Notably, SDPO also outperforms baselines in standard RLVR environments that only return scalar feedback by using successful rollouts as implicit feedback for failed attempts. Finally, applying SDPO to individual questions at test time accelerates discovery on difficult binary-reward tasks, achieving the same discovery probability as best-of-k sampling or multi-turn conversations with 3x fewer attempts.
Publisher OA PDF
PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models
ArXiv.org · 2026-02-03
articleOpen access
Relational Foundation Models (RFMs) facilitate data-driven decision-making by learning from complex multi-table databases. However, the diverse relational databases needed to train such models are rarely public due to privacy constraints. While there are methods to generate synthetic tabular data of arbitrary size, incorporating schema structure and primary--foreign key connectivity for multi-table generation remains challenging. Here we introduce PluRel, a framework to synthesize multi-tabular relational databases from scratch. In a step-by-step fashion, PluRel models (1) schemas with directed graphs, (2) inter-table primary-foreign key connectivity with bipartite graphs, and, (3) feature distributions in tables via conditional causal mechanisms. The design space across these stages supports the synthesis of a wide range of diverse databases, while being computationally lightweight. Using PluRel, we observe for the first time that (1) RFM pretraining loss exhibits power-law scaling with the number of synthetic databases and total pretraining tokens, (2) scaling the number of synthetic databases improves generalization to real databases, and (3) synthetic pretraining yields strong base models for continued pretraining on real databases. Overall, our framework and results position synthetic data scaling as a promising paradigm for RFMs.
Publisher OA PDF
Explainability in the Wild, or Wild Explanations? Evidence From Predicting Tax Evasion
Harvard Data Science Review · 2026-02-18
articleOpen access
As artificial intelligence algorithms become more prevalent in high-stakes risk assessment, policymakers have increasingly relied on explainability tools for interpretability. Despite growing mandates that AI-based decisions include explanations, there remains little empirical evidence demonstrating the effectiveness of these techniques in real-world applications. This gap often stems from the absence of a clear ground truth for evaluating explanations. In this work, we present an empirical evaluation of explanation techniques in collaboration with the United States Internal Revenue Service (IRS). Using real, line-by-line IRS audits from randomly-selected taxpayers, we decompose one component of aggregate tax under-reporting into its constituent line-item misreporting and apply explainability techniques to recover these risks; the aggregate risk is a function of the constituent risk. Our study makes three contributions. First, we empirically evaluate how well local explanation models recover true constituent risks. Second, we compare local explanation models to estimating constituent risks directly. Finally, we situate these findings in a practical setting where explanations are critical not only for transparency but also as guidance for the users of the model's predictions. Our analysis reveals that the quality of local explanations is tied to the quality of the underlying model. Yet even with a theoretically perfect underlying model, local explanations still fail to accurately capture the true risk. While directly estimating constituent risks may yield more accurate results, simplistic rule-based heuristics often overlook the complexity of risk. These findings highlight the need for thoughtful application of explanation techniques in high-risk domains, where errors can have significant consequences.
Publisher OA PDF DOI
Learning to Discover at Test Time
Open MIND · 2026-01-22
preprint
How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to $2\times$ faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.
DOI
Reinforcement Learning via Self-Distillation
Open MIND · 2026-01-28
preprint
Large language models are increasingly post-trained with reinforcement learning in verifiable domains such as code and math. Yet, current methods for reinforcement learning with verifiable rewards (RLVR) learn only from a scalar outcome reward per attempt, creating a severe credit-assignment bottleneck. Many verifiable environments actually provide rich textual feedback, such as runtime errors or judge evaluations, that explain why an attempt failed. We formalize this setting as reinforcement learning with rich feedback and introduce Self-Distillation Policy Optimization (SDPO), which converts tokenized feedback into a dense learning signal without any external teacher or explicit reward model. SDPO treats the current model conditioned on feedback as a self-teacher and distills its feedback-informed next-token predictions back into the policy. In this way, SDPO leverages the model's ability to retrospectively identify its own mistakes in-context. Across scientific reasoning, tool use, and competitive programming on LiveCodeBench v6, SDPO improves sample efficiency and final accuracy over strong RLVR baselines. Notably, SDPO also outperforms baselines in standard RLVR environments that only return scalar feedback by using successful rollouts as implicit feedback for failed attempts. Finally, applying SDPO to individual questions at test time accelerates discovery on difficult binary-reward tasks, achieving the same discovery probability as best-of-k sampling or multi-turn conversations with 3x fewer attempts.
DOI
PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models
Open MIND · 2026-02-03
preprint
Relational Foundation Models (RFMs) facilitate data-driven decision-making by learning from complex multi-table databases. However, the diverse relational databases needed to train such models are rarely public due to privacy constraints. While there are methods to generate synthetic tabular data of arbitrary size, incorporating schema structure and primary--foreign key connectivity for multi-table generation remains challenging. Here we introduce PluRel, a framework to synthesize multi-tabular relational databases from scratch. In a step-by-step fashion, PluRel models (1) schemas with directed graphs, (2) inter-table primary-foreign key connectivity with bipartite graphs, and, (3) feature distributions in tables via conditional causal mechanisms. The design space across these stages supports the synthesis of a wide range of diverse databases, while being computationally lightweight. Using PluRel, we observe for the first time that (1) RFM pretraining loss exhibits power-law scaling with the number of synthetic databases and total pretraining tokens, (2) scaling the number of synthetic databases improves generalization to real databases, and (3) synthetic pretraining yields strong base models for continued pretraining on real databases. Overall, our framework and results position synthetic data scaling as a promising paradigm for RFMs.
DOI
Outcome Rewards Do Not Guarantee Verifiable or Causally Important Reasoning
ArXiv.org · 2026-04-23
articleOpen access
Reinforcement Learning from Verifiable Rewards (RLVR) on chain-of-thought reasoning has become a standard part of language model post-training recipes. A common assumption is that the reasoning chains trained through RLVR reliably represent how a model gets to its answer. In this paper, we develop two metrics for critically examining this assumption: Causal Importance of Reasoning (CIR), which measures the cumulative effect of reasoning tokens on the final answer, and Sufficiency of Reasoning (SR), which measures whether a verifier can arrive at an unambiguous answer based on the reasoning alone. Through experiments with the Qwen2.5 model series and ReasoningGym tasks, we find that: (1) while RLVR does improve task accuracy, it does not reliably improve CIR or SR, calling the role of reasoning in model performance into question; (2) a small amount of SFT before RLVR can be a remedy for low CIR and SR; and (3) CIR and SR can be improved even without SFT by applying auxiliary CIR/SR rewards on top of the outcome-based reward. This joint reward matches the accuracy of RLVR while also leading to causally important and sufficient reasoning. These results show that RLVR does not always lead models to rely on reasoning in the way that is commonly thought, but this issue can be remedied with simple modifications to the post-training procedure.
Publisher OA PDF
ALMo: Interactive Aim-Limit-Defined, Multi-Objective System for Personalized High-Dose-Rate Brachytherapy Treatment Planning and Visualization for Cervical Cancer
Open MIND · 2026-02-14
preprintSenior author
In complex clinical decision-making, clinicians must often track a variety of competing metrics defined by aim (ideal) and limit (strict) thresholds. Sifting through these high-dimensional tradeoffs to infer the optimal patient-specific strategy is cognitively demanding and historically prone to variability. In this paper, we address this challenge within the context of High-Dose-Rate (HDR) brachytherapy for cervical cancer, where planning requires strictly managing radiation hot spots while balancing tumor coverage against organ sparing. We present ALMo (Aim-Limit-defined Multi-Objective system), an interactive decision support system designed to infer and operationalize clinician intent. ALMo employs a novel optimization framework that minimizes manual input through automated parameter setup and enables flexible control over toxicity risks. Crucially, the system allows clinicians to navigate the Pareto surface of dosimetric tradeoffs by directly manipulating intuitive aim and limit values. In a retrospective evaluation of 25 clinical cases, ALMo generated treatment plans that consistently met or exceeded manual planning quality, with 65% of cases demonstrating dosimetric improvements. Furthermore, the system significantly enhanced efficiency, reducing average planning time to approximately 17 minutes, compared to the conventional 30-60 minutes. While validated in brachytherapy, ALMo demonstrates a generalized framework for streamlining interaction in multi-criteria clinical decision-making.
DOI

Recent grants

CAREER: Thinking that is "just right": Query-Specific Probabilistic Reasoning and its Application to Large-Scale Sensor Networks
NSF · $506k · 2006–2012
CSR-EHS: Collaborative Research: A General, Efficient and Robust Platform for Enabling Control Applications in Sensor Networks
NSF · $200k · 2005–2009
RI: Small: GraphLab 2: An Abstraction and System for Large-Scale Parallel Machine Learning on Natural Graphs
NSF · $450k · 2012–2017
NeTS-NOSS: SNI: A General and Robust Networking Architecture for Distributed Data Processing in Sensor Networks
NSF · $422k · 2006–2010

Frequent coauthors

Andreas Krause
49 shared
Jure Leskovec
Stanford University
26 shared
Joseph M. Hellerstein
University of California, Berkeley
25 shared
Daphne Koller
23 shared
Jeanne M. VanBriesen
22 shared
Paul S. Fischbeck
Decision Sciences (United States)
22 shared
Shannon L. Isovitsch
Forbes Hospital
20 shared
Mitchell J. Small
Carnegie Mellon University
20 shared

Labs

Carlos GuestrinPI

Education

Ph.D., Computer Science
Stanford University

Awards & honors

Member of the National Academy of Engineering
Fortinet Founders Professor, Stanford

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Carlos Guestrin

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you