Christopher D. Manning

· Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer ScienceVerified

Stanford University · Linguistics

Active 1971–2026

h-index149

Citations194.9k

Papers859178 last 5y

Funding—

Faculty page Lab page Website

See your match with Christopher D. Manning — sign in to PhdFit.Sign in

About

Christopher Manning is the Thomas M. Siebel Professor in Machine Learning, and a Professor of Linguistics and of Computer Science at Stanford University. He is a co-founder and Senior Fellow of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Manning's research has pioneered Natural Language Understanding and Inference using Neural Networks and Deep Learning since 2010, with impactful work on sentiment analysis, paraphrase detection, the GloVe model of word vectors, attention mechanisms, neural machine translation, question answering, self-supervised model pre-training, tree-recursive neural networks, machine reasoning, summarization, and dependency parsing. His contributions have been recognized with three successive ACL Test of Time Awards (2023–2025) and the IEEE John von Neumann Medal (2024). Prior to his current roles, Manning led the development of empirical, probabilistic approaches to NLP, computational linguistics, and language understanding, establishing theories and systems for natural language inference, syntactic parsing, machine translation, and multilingual language processing. He is a principal developer of Stanford Dependencies and Universal Dependencies, and has authored monographs on ergativity and complex predicates. Manning has also significantly contributed to NLP education through foundational textbooks and online courses, and has been an influential advocate for open source software in NLP with Stanford CoreNLP and Stanza. He holds a B.A. from The Australian National University, a Ph.D. from Stanford, and an Honorary Doctorate from the University of Amsterdam. His academic career includes faculty positions at Carnegie Mellon University and the University of Sydney before returning to Stanford, where he has served as President of the Association for Computational Linguistics and received numerous honors including election to the American Academy of Arts and Sciences and the National Academy of Engineering.

Research topics

Artificial Intelligence
Computer Science
Natural Language Processing
Machine Learning
Data Mining
Computer Security
Programming language
Information Retrieval
Political Science
Data science
Engineering ethics
Law
Speech recognition
Engineering
Psychology
Management science
Linguistics

Selected publications

SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization
arXiv (Cornell University) · 2026-05-08
preprintOpen access
LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.
Publisher DOI
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace
ArXiv.org · 2026-05-11
articleOpen access
We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.
Publisher OA PDF
SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization
ArXiv.org · 2026-05-08
articleOpen access
LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.
Publisher OA PDF
Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace
arXiv (Cornell University) · 2026-05-11
preprintOpen access
We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.
Publisher DOI
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
arXiv (Cornell University) · 2025-01-28
preprintOpen access
Fine-grained steering of language model outputs is essential for safety and reliability. Prompting and finetuning are widely used to achieve these goals, but interpretability researchers have proposed a variety of representation-based techniques as well, including sparse autoencoders (SAEs), linear artificial tomography, supervised steering vectors, linear probes, and representation finetuning. At present, there is no benchmark for making direct comparisons between these proposals. Therefore, we introduce AxBench, a large-scale benchmark for steering and concept detection, and report experiments on Gemma-2-2B and 9B. For steering, we find that prompting outperforms all existing methods, followed by finetuning. For concept detection, representation-based methods such as difference-in-means, perform the best. On both evaluations, SAEs are not competitive. We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks. Along with AxBench, we train and publicly release SAE-scale feature dictionaries for ReFT-r1 and DiffMean.
Publisher OA PDF DOI
AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County
ArXiv.org · 2025-02-12
preprintOpen access
Legal reform can be challenging in light of the volume, complexity, and interdependence of laws, codes, and records. One salient example of this challenge is the effort to restrict and remove racially restrictive covenants, clauses in property deeds that historically barred individuals of specific races from purchasing homes. Despite the Supreme Court holding such racial covenants unenforceable in 1948, they persist in property records across the United States. Many jurisdictions have moved to identify and strike these provisions, including California, which mandated in 2021 that all counties implement such a process. Yet the scale can be overwhelming, with Santa Clara County (SCC) alone having over 24 million property deed documents, making purely manual review infeasible. We present a novel approach to addressing this pressing issue, developed through a partnership with the SCC Clerk-Recorder's Office. First, we leverage an open large language model, finetuned to detect racial covenants with high precision and recall. We estimate that this system reduces manual efforts by 86,500 person hours and costs less than 2% of the cost for a comparable off-the-shelf closed model. Second, we illustrate the County's integration of this model into responsible operational practice, including legal review and the creation of a historical registry, and release our model to assist the hundreds of jurisdictions engaged in similar efforts. Finally, our results reveal distinct periods of utilization of racial covenants, sharp geographic clustering, and the disproportionate role of a small number of developers in maintaining housing discrimination. We estimate that by 1950, one in four properties across the County were subject to racial covenants.
Publisher OA PDF DOI
Transcribe, Translate, or Transliterate: An Investigation of Intermediate Representations in Spoken Language Models
2025-12-06
article
Spoken language models (SLMs) that integrate speech with large language models (LMs) rely on modality adapters (MAs) to map the output of speech encoders to a representation that is understandable to the decoder LM. Yet we know very little about how these crucial MAs transform representations. Here we examine the MA output representation in three SLMs (SALMONN, Qwen2-Audio and Phi-4-Multimodal-Instruct). By finding the nearest decoder LM token to an MA representation, we uncover two strategies for MA representations. For models using a Whisper encoder, MAs appear to represent the meaning of the input using an English-based interlingua, allowing them to handle languages unseen in instruction tuning. For models that don’ t, like Phi-4-Multimodal-Instruct, MAs instead represent the phonetics of the input, but expressed with English words. We hypothesize that which arises depends on whether the speech encoder is trained only for speech recognition or also for translation.
Publisher DOI
Quantifying large language model usage in scientific papers
Nature Human Behaviour · 2025-08-04 · 30 citations
article
Publisher DOI
Do Language Models Use Their Depth Efficiently?
ArXiv.org · 2025-05-20 · 1 citations
preprintOpen access
Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features to create higher-order computations that are impossible in shallow models, or do they merely spread the same kinds of computation out over more layers? To address these questions, we analyze the residual stream of the Llama 3.1, Qwen 3, and OLMo 2 family of models. We find: First, comparing the output of the sublayers to the residual stream reveals that layers in the second half contribute much less than those in the first half, with a clear phase transition between the two halves. Second, skipping layers in the second half has a much smaller effect on future computations and output predictions. Third, for multihop tasks, we are unable to find evidence that models are using increased depth to compose subresults in examples involving many hops. Fourth, we seek to directly address whether deeper models are using their additional layers to perform new kinds of computation. To do this, we train linear maps from the residual stream of a shallow model to a deeper one. We find that layers with the same relative depth map best to each other, suggesting that the larger model simply spreads the same computations out over its many layers. All this evidence suggests that deeper models are not using their depth to learn new kinds of computation, but only using the greater depth to perform more fine-grained adjustments to the residual. This may help explain why increasing scale leads to diminishing returns for stacked Transformer architectures.
Publisher OA PDF DOI
A New Pair of GloVes
ArXiv.org · 2025-07-24 · 1 citations
preprintOpen accessSenior author
This report documents, describes, and evaluates new 2024 English GloVe (Global Vectors for Word Representation) models. While the original GloVe models built in 2014 have been widely used and found useful, languages and the world continue to evolve and we thought that current usage could benefit from updated models. Moreover, the 2014 models were not carefully documented as to the exact data versions and preprocessing that were used, and we rectify this by documenting these new models. We trained two sets of word embeddings using Wikipedia, Gigaword, and a subset of Dolma. Evaluation through vocabulary comparison, direct testing, and NER tasks shows that the 2024 vectors incorporate new culturally and linguistically relevant words, perform comparably on structural tasks like analogy and similarity, and demonstrate improved performance on recent, temporally dependent NER datasets such as non-Western newswire data.
Publisher OA PDF DOI

Frequent coauthors

Percy Liang
46 shared
Christopher Potts
44 shared
Marie-Catherine de Marneffe
39 shared
Kevin Clark
Cures Within Reach
37 shared
Richard Socher
34 shared
Minh-Thang Luong
Viet Tri University of Industry
29 shared
Shikhar Murty
28 shared
Sebastian Schuster
28 shared

Labs

The Stanford Natural Language Processing GroupPI
Performing groundbreaking Natural Language Processing research since 1999.

Awards & honors

Best Paper Award at EACL 2026
10-year Test of Time Award at ACL 2025
ACL Test of Time Awards (2023–2025)
IEEE John von Neumann Medal (2024)
American Academy of Arts and Sciences (2025)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Christopher D. Manning

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you