Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Christopher D. Manning

Christopher D. Manning

· Thomas M. Siebel Professor in Machine Learning, Professor of Linguistics and of Computer ScienceVerified

Stanford University · Linguistics

Active 1971–2026

h-index149
Citations194.9k
Papers859178 last 5y
Funding
See your match with Christopher D. Manning — sign in to PhdFit.Sign in

About

Christopher Manning is the Thomas M. Siebel Professor in Machine Learning, and a Professor of Linguistics and of Computer Science at Stanford University. He is a co-founder and Senior Fellow of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Manning's research has pioneered Natural Language Understanding and Inference using Neural Networks and Deep Learning since 2010, with impactful work on sentiment analysis, paraphrase detection, the GloVe model of word vectors, attention mechanisms, neural machine translation, question answering, self-supervised model pre-training, tree-recursive neural networks, machine reasoning, summarization, and dependency parsing. His contributions have been recognized with three successive ACL Test of Time Awards (2023–2025) and the IEEE John von Neumann Medal (2024). Prior to his current roles, Manning led the development of empirical, probabilistic approaches to NLP, computational linguistics, and language understanding, establishing theories and systems for natural language inference, syntactic parsing, machine translation, and multilingual language processing. He is a principal developer of Stanford Dependencies and Universal Dependencies, and has authored monographs on ergativity and complex predicates. Manning has also significantly contributed to NLP education through foundational textbooks and online courses, and has been an influential advocate for open source software in NLP with Stanford CoreNLP and Stanza. He holds a B.A. from The Australian National University, a Ph.D. from Stanford, and an Honorary Doctorate from the University of Amsterdam. His academic career includes faculty positions at Carnegie Mellon University and the University of Sydney before returning to Stanford, where he has served as President of the Association for Computational Linguistics and received numerous honors including election to the American Academy of Arts and Sciences and the National Academy of Engineering.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Natural Language Processing
  • Machine Learning
  • Data Mining
  • Computer Security
  • Programming language
  • Information Retrieval
  • Political Science
  • Data science
  • Engineering ethics
  • Law
  • Speech recognition
  • Engineering
  • Psychology
  • Management science
  • Linguistics

Selected publications

  • SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

    arXiv (Cornell University) · 2026-05-08

    preprintOpen access

    LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.

  • Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

    ArXiv.org · 2026-05-11

    articleOpen access

    We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.

  • SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization

    ArXiv.org · 2026-05-08

    articleOpen access

    LLM coding agents now generate code at an unprecedented scale, yet LLM-generated code introduces cybersecurity vulnerabilities into codebases without human involvement. Even when frontier models are explicitly asked to write secure production code with relevant weaknesses to avoid in context, we find that they still produce verifiable vulnerabilities on average 23% of the time across a corpus of 250 benign coding prompts. We introduce SecureForge, an automated pipeline that both audits security risks of frontier models and produces auditing-informed secure system prompts that reduce output security vulnerabilities while maintaining unit test performance. SecureForge first identifies benign prompts that produce statically detectable vulnerabilities, and then amplifies them into a large synthetic prompt corpus of diverse scenarios using a Markovian sampling technique to jointly maintain error rates and prompt diversity. This corpus is then used to iteratively optimize the system prompts to reduce output security vulnerabilities. On frontier models, SecureForge yields a statistically significant Pareto improvement in both unit test success and output security, with output vulnerabilities reduced by up to 48%. The resulting system prompts transfer zero-shot to in-the-wild coding agent prompts, without any exposure to real user prompt distributions during optimization.

  • Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace

    arXiv (Cornell University) · 2026-05-11

    preprintOpen access

    We introduce Shepherd, a functional programming model that formalizes meta-agent operations on target agents as functions, with core operations mechanized in Lean. Shepherd records every agent-environment interaction as a typed event in a Git-like execution trace, enabling any past state to be forked and replayed. The system forks the agent process and its filesystem $5\times$ faster than Docker, achieving $>95\%$ prompt-cache reuse on replay. We demonstrate the model through three applications. First, in runtime intervention, a live supervisor increases pair coding pass rates from 28.8% to 54.7% on CooperBench. Second, in counterfactual meta-optimization, branching exploration outperforms baselines across four benchmarks by up to 11 points while reducing wall-clock time by up to 58%. Third, in Tree-RL training, forking rollouts at selected turns improves TerminalBench-2 performance from 34.2% to 39.4%. These results establish Shepherd as an efficient infrastructure for programming meta-agents. We open-source the system to support future research.

  • AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

    arXiv (Cornell University) · 2025-01-28

    preprintOpen access

    Fine-grained steering of language model outputs is essential for safety and reliability. Prompting and finetuning are widely used to achieve these goals, but interpretability researchers have proposed a variety of representation-based techniques as well, including sparse autoencoders (SAEs), linear artificial tomography, supervised steering vectors, linear probes, and representation finetuning. At present, there is no benchmark for making direct comparisons between these proposals. Therefore, we introduce AxBench, a large-scale benchmark for steering and concept detection, and report experiments on Gemma-2-2B and 9B. For steering, we find that prompting outperforms all existing methods, followed by finetuning. For concept detection, representation-based methods such as difference-in-means, perform the best. On both evaluations, SAEs are not competitive. We introduce a novel weakly-supervised representational method (Rank-1 Representation Finetuning; ReFT-r1), which is competitive on both tasks while providing the interpretability advantages that prompting lacks. Along with AxBench, we train and publicly release SAE-scale feature dictionaries for ReFT-r1 and DiffMean.

  • AI for Scaling Legal Reform: Mapping and Redacting Racial Covenants in Santa Clara County

    ArXiv.org · 2025-02-12

    preprintOpen access

    Legal reform can be challenging in light of the volume, complexity, and interdependence of laws, codes, and records. One salient example of this challenge is the effort to restrict and remove racially restrictive covenants, clauses in property deeds that historically barred individuals of specific races from purchasing homes. Despite the Supreme Court holding such racial covenants unenforceable in 1948, they persist in property records across the United States. Many jurisdictions have moved to identify and strike these provisions, including California, which mandated in 2021 that all counties implement such a process. Yet the scale can be overwhelming, with Santa Clara County (SCC) alone having over 24 million property deed documents, making purely manual review infeasible. We present a novel approach to addressing this pressing issue, developed through a partnership with the SCC Clerk-Recorder's Office. First, we leverage an open large language model, finetuned to detect racial covenants with high precision and recall. We estimate that this system reduces manual efforts by 86,500 person hours and costs less than 2% of the cost for a comparable off-the-shelf closed model. Second, we illustrate the County's integration of this model into responsible operational practice, including legal review and the creation of a historical registry, and release our model to assist the hundreds of jurisdictions engaged in similar efforts. Finally, our results reveal distinct periods of utilization of racial covenants, sharp geographic clustering, and the disproportionate role of a small number of developers in maintaining housing discrimination. We estimate that by 1950, one in four properties across the County were subject to racial covenants.

  • Transcribe, Translate, or Transliterate: An Investigation of Intermediate Representations in Spoken Language Models

    2025-12-06

    article

    Spoken language models (SLMs) that integrate speech with large language models (LMs) rely on modality adapters (MAs) to map the output of speech encoders to a representation that is understandable to the decoder LM. Yet we know very little about how these crucial MAs transform representations. Here we examine the MA output representation in three SLMs (SALMONN, Qwen2-Audio and Phi-4-Multimodal-Instruct). By finding the nearest decoder LM token to an MA representation, we uncover two strategies for MA representations. For models using a Whisper encoder, MAs appear to represent the meaning of the input using an English-based interlingua, allowing them to handle languages unseen in instruction tuning. For models that don’ t, like Phi-4-Multimodal-Instruct, MAs instead represent the phonetics of the input, but expressed with English words. We hypothesize that which arises depends on whether the speech encoder is trained only for speech recognition or also for translation.

  • Quantifying large language model usage in scientific papers

    Nature Human Behaviour · 2025-08-04 · 30 citations

    article
  • Do Language Models Use Their Depth Efficiently?

    ArXiv.org · 2025-05-20 · 1 citations

    preprintOpen access

    Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features to create higher-order computations that are impossible in shallow models, or do they merely spread the same kinds of computation out over more layers? To address these questions, we analyze the residual stream of the Llama 3.1, Qwen 3, and OLMo 2 family of models. We find: First, comparing the output of the sublayers to the residual stream reveals that layers in the second half contribute much less than those in the first half, with a clear phase transition between the two halves. Second, skipping layers in the second half has a much smaller effect on future computations and output predictions. Third, for multihop tasks, we are unable to find evidence that models are using increased depth to compose subresults in examples involving many hops. Fourth, we seek to directly address whether deeper models are using their additional layers to perform new kinds of computation. To do this, we train linear maps from the residual stream of a shallow model to a deeper one. We find that layers with the same relative depth map best to each other, suggesting that the larger model simply spreads the same computations out over its many layers. All this evidence suggests that deeper models are not using their depth to learn new kinds of computation, but only using the greater depth to perform more fine-grained adjustments to the residual. This may help explain why increasing scale leads to diminishing returns for stacked Transformer architectures.

  • A New Pair of GloVes

    ArXiv.org · 2025-07-24 · 1 citations

    preprintOpen accessSenior author

    This report documents, describes, and evaluates new 2024 English GloVe (Global Vectors for Word Representation) models. While the original GloVe models built in 2014 have been widely used and found useful, languages and the world continue to evolve and we thought that current usage could benefit from updated models. Moreover, the 2014 models were not carefully documented as to the exact data versions and preprocessing that were used, and we rectify this by documenting these new models. We trained two sets of word embeddings using Wikipedia, Gigaword, and a subset of Dolma. Evaluation through vocabulary comparison, direct testing, and NER tasks shows that the 2024 vectors incorporate new culturally and linguistically relevant words, perform comparably on structural tasks like analogy and similarity, and demonstrate improved performance on recent, temporally dependent NER datasets such as non-Western newswire data.

Frequent coauthors

  • Percy Liang

    46 shared
  • Christopher Potts

    44 shared
  • Marie-Catherine de Marneffe

    39 shared
  • Kevin Clark

    Cures Within Reach

    37 shared
  • Richard Socher

    34 shared
  • Minh-Thang Luong

    Viet Tri University of Industry

    29 shared
  • Shikhar Murty

    28 shared
  • Sebastian Schuster

    28 shared

Labs

Awards & honors

  • Best Paper Award at EACL 2026
  • 10-year Test of Time Award at ACL 2025
  • ACL Test of Time Awards (2023–2025)
  • IEEE John von Neumann Medal (2024)
  • American Academy of Arts and Sciences (2025)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Christopher D. Manning

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup