Christopher Potts

Verified

Stanford University · Symbolic Systems

Active 2001–2026

h-index50

Citations25.3k

Papers307158 last 5y

Funding$1.2M

Faculty page

See your match with Christopher Potts — sign in to PhdFit.Sign in

About

Christopher Potts is a Professor of Linguistics at Stanford University. He holds a B.A. in Linguistics with a German minor from New York University, obtained in 1999, and an M.A. and Ph.D. in Linguistics from the University of California, Santa Cruz, completed in 2000 and 2003 respectively. His academic focus includes applied logic, artificial intelligence, cognitive science, natural language, and philosophical foundations. In addition to his role in the Department of Linguistics, he is a member of the Bio-X Faculty and an affiliate of the Institute for Human-Centered Artificial Intelligence (HAI).

Research topics

Computer Science
Artificial Intelligence
Political Science
Data science
Management science
Engineering
Psychology
Engineering ethics
Law

Selected publications

Counterfactual Simulation Training for Chain-of-Thought Faithfulness
Open MIND · 2026-02-24
preprintSenior author
Inspecting Chain-of-Thought reasoning is among the most common means of understanding why an LLM produced its output. But well-known problems with CoT faithfulness severely limit what insights can be gained from this practice. In this paper, we introduce a training method called Counterfactual Simulation Training (CST), which aims to improve CoT faithfulness by rewarding CoTs that enable a simulator to accurately predict a model's outputs over counterfactual inputs. We apply CST in two settings: (1) CoT monitoring with cue-based counterfactuals, to detect when models rely on spurious features, reward hack, or are sycophantic, and (2) counterfactual simulation over generic model-based counterfactuals, to encourage models to produce more faithful, generalizable reasoning in the CoT. Experiments with models up to 235B parameters show that CST can substantially improve monitor accuracy on cue-based counterfactuals (by 35 accuracy points) as well as simulatability over generic counterfactuals (by 2 points). We further show that: (1) CST outperforms prompting baselines, (2) rewriting unfaithful CoTs with an LLM is 5x more efficient than RL alone, (3) faithfulness improvements do not generalize to dissuading cues (as opposed to persuading cues), and (4) larger models do not show more faithful CoT out of the box, but they do benefit more from CST. These results suggest that CST can improve CoT faithfulness in general, with promising applications for CoT monitoring. Code for experiments in this paper is available at https://github.com/peterbhase/counterfactual-simulation-training
DOI
Sa2126 ETIOLOGY-SPECIFIC EUS-GUIDED SPLEEN SHEAR WAVE ELASTOGRAPHY THRESHOLDS FOR CSPH AND DECOMPENSATION RISK STRATIFICATION IN A RURAL APPALACHIAN MASH-DOMINANT COHORT WITH SEVERE OBESITY
Gastroenterology · 2026-05-01
article
Publisher DOI
Etiology-Specific EUS-Guided Spleen Sheer Wave Elastography Thresholds for CSPH and Decompensation Risk Stratification in a Rural Appalachian MASH-Dominant Cohort with Severe Obesity
2026-01-01
articleOpen access
Publisher DOI
Sa2126 ETIOLOGY-SPECIFIC EUS-GUIDED SPLEEN SHEAR WAVE ELASTOGRAPHY THRESHOLDS FOR CSPH AND DECOMPENSATION RISK STRATIFICATION IN A RURAL APPALACHIAN MASH-DOMINANT COHORT WITH SEVERE OBESITY
Gastrointestinal Endoscopy · 2026-05-01
article
Publisher DOI
Counterfactual Simulation Training for Chain-of-Thought Faithfulness
arXiv (Cornell University) · 2026-02-24
articleOpen accessSenior author
Inspecting Chain-of-Thought reasoning is among the most common means of understanding why an LLM produced its output. But well-known problems with CoT faithfulness severely limit what insights can be gained from this practice. In this paper, we introduce a training method called Counterfactual Simulation Training (CST), which aims to improve CoT faithfulness by rewarding CoTs that enable a simulator to accurately predict a model's outputs over counterfactual inputs. We apply CST in two settings: (1) CoT monitoring with cue-based counterfactuals, to detect when models rely on spurious features, reward hack, or are sycophantic, and (2) counterfactual simulation over generic model-based counterfactuals, to encourage models to produce more faithful, generalizable reasoning in the CoT. Experiments with models up to 235B parameters show that CST can substantially improve monitor accuracy on cue-based counterfactuals (by 35 accuracy points) as well as simulatability over generic counterfactuals (by 2 points). We further show that: (1) CST outperforms prompting baselines, (2) rewriting unfaithful CoTs with an LLM is 5x more efficient than RL alone, (3) faithfulness improvements do not generalize to dissuading cues (as opposed to persuading cues), and (4) larger models do not show more faithful CoT out of the box, but they do benefit more from CST. These results suggest that CST can improve CoT faithfulness in general, with promising applications for CoT monitoring. Code for experiments in this paper is available at https://github.com/peterbhase/counterfactual-simulation-training
Publisher OA PDF
Mo2098 ENDOSCOPIC ULTRASOUND-GUIDED SHEAR WAVE ELASTOGRAPHY VS FIBROSCAN FOR PORTAL HYPERTENSION RISK STRATIFICATION IN ADVANCED LIVER DISEASE: A HIGH-VOLUME ENDOHEPATOLOGY U.S. CENTER RETROSPECTIVE COHORT STUDY
Gastrointestinal Endoscopy · 2026-05-01
article
Publisher DOI
Mo2098 ENDOSCOPIC ULTRASOUND-GUIDED SHEAR WAVE ELASTOGRAPHY VS FIBROSCAN FOR PORTAL HYPERTENSION RISK STRATIFICATION IN ADVANCED LIVER DISEASE: A HIGH-VOLUME ENDOHEPATOLOGY U.S. CENTER RETROSPECTIVE COHORT STUDY
Gastroenterology · 2026-05-01
article
Publisher DOI
Do Language Models Use Their Depth Efficiently?
ArXiv.org · 2025-05-20 · 1 citations
preprintOpen accessSenior author
Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features to create higher-order computations that are impossible in shallow models, or do they merely spread the same kinds of computation out over more layers? To address these questions, we analyze the residual stream of the Llama 3.1, Qwen 3, and OLMo 2 family of models. We find: First, comparing the output of the sublayers to the residual stream reveals that layers in the second half contribute much less than those in the first half, with a clear phase transition between the two halves. Second, skipping layers in the second half has a much smaller effect on future computations and output predictions. Third, for multihop tasks, we are unable to find evidence that models are using increased depth to compose subresults in examples involving many hops. Fourth, we seek to directly address whether deeper models are using their additional layers to perform new kinds of computation. To do this, we train linear maps from the residual stream of a shallow model to a deeper one. We find that layers with the same relative depth map best to each other, suggesting that the larger model simply spreads the same computations out over its many layers. All this evidence suggests that deeper models are not using their depth to learn new kinds of computation, but only using the greater depth to perform more fine-grained adjustments to the residual. This may help explain why increasing scale leads to diminishing returns for stacked Transformer architectures.
Publisher OA PDF DOI
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
arXiv (Cornell University) · 2025-01-28
preprintOpen accessSenior author
Fine-grained steering of language model outputs is essential for safety and reliability. Prompting and finetuning are widely used to achieve these goals, but interpretability researchers have proposed a variety of representation-based techniques as well, including sparse autoencoders (SAEs), linear artificial tomography, supervised steering vectors, linear probes, and representation finetuning. At present, there is no benchmark for making direct comparisons between these proposals. Therefore, we introduce AxBench, a large-scale benchmark for steering and concept detection, and report experiments on Gemma-2-2B and 9B. For steering, we find that prompting outperforms all existing methods, followed by finetuning. For concept detection, representation-based methods such as difference-in-means, perform the best. On both evaluations, SAEs are not competitive. We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks. Along with AxBench, we train and publicly release SAE-scale feature dictionaries for ReFT-r1 and DiffMean.
Publisher OA PDF DOI
WARP: An Efficient Engine for Multi-Vector Retrieval
2025-07-13 · 5 citations
articleOpen access
Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we present WARP, a retrieval engine that substantially improves the efficiency of retrievers trained with the XTR objective through three key innovations: (1) WARPSELECT for dynamic similarity imputation; (2) implicit decompression, avoiding costly vector reconstruction during retrieval; and (3) a two-stage reduction process for efficient score aggregation. Combined with highly-optimized C++ kernels, our system reduces end-to-end latency compared to XTR's reference implementation by 41x, and achieves a 3x speedup over the ColBERTv2/PLAID engine, while preserving retrieval quality. WARP also reduces index sizes by a factor of 2x-4x compared to XTR, enabling deployment on memory-constrained devices.
Publisher OA PDF DOI

Recent grants

Expressive Content and the Semantics of Contexts
NSF · $218k · 2007–2012
RI: Medium: Bringing Sentiment Analysis and Social Network Analysis Together
NSF · $1.0M · 2012–2017

Frequent coauthors

Atticus Geiger
50 shared
Zhengxuan Wu
45 shared
Christopher D. Manning
44 shared
Omar Khattab
Stanford University
33 shared
Elisa Kreiss
33 shared
Noah D. Goodman
27 shared
Matei Zaharia
25 shared
Samuel R. Bowman
19 shared

Awards & honors

Stanford Honors Thesis Prizes - Symbolic Systems
Glushko Prize for Excellence in Undergraduate Research in Sy…
Barwise Award for Distinguished Contributions to Symbolic Sy…
Symbolic Systems Distinguished Teaching Award

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Christopher Potts

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you