Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Jonathan Cohen

Jonathan Cohen

· Robert Bendheim and Lynn Bendheim Thoman Professor in NeuroscienceVerified

Princeton University · Psychology

Active 1950–2025

h-index162
Citations196.0k
Papers628123 last 5y
Funding$80.1M1 active
See your match with Jonathan Cohen — sign in to PhdFit.Sign in

About

Jonathan Cohen is the Robert Bendheim and Lynn Bendheim Thoman Professor in Neuroscience at Princeton University, affiliated with the Princeton Neuroscience Institute. His research focuses on the neurobiological mechanisms underlying cognitive control and their disturbance in psychiatric disorders such as schizophrenia and depression. Cognitive control refers to the ability to guide attention, thought, and action in accordance with goals or intentions. His work aims to develop mechanistically explicit hypotheses about the functioning of brain systems involved in cognitive control, including the prefrontal cortex, anterior cingulate cortex, basal ganglia, and brainstem neuromodulatory systems, and to test these hypotheses through empirical studies. A key motivation of his research is to establish a theoretically sound foundation for understanding how disturbances in brain function manifest as disorders of thought and behavior in psychiatric illnesses.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Cognitive science
  • Psychology
  • Cognitive psychology
  • Neuroscience
  • Management science
  • Econometrics
  • Human–computer interaction
  • Economics

Selected publications

  • Granting Unemployment Insurance Benefits in Borderline Cases Helps Workers at Low Cost

    Employment Research · 2025-01-01

    articleOpen access1st authorCorresponding
  • Learning expectations shape cognitive control allocation

    2025-03-27 · 1 citations

    preprintOpen accessSenior author

    Current models frame the allocation of cognitive control as a process of expected utility maximization. The benefits of a candidate control signal are weighed against its costs (e.g., opportunity costs). Recent theorizing has found that, despite promoting the counterintuitive behavior of longer deliberation, which is less rewarding in the short term, it is nevertheless normative to account for the value of learning when determining control allocation. Here, we sought to test this proposal by examining whether people were willing to allocate greater control and thereby expend greater effort (e.g., deliberate for longer) when they perceived a task to be learnable compared to when they did not. We found that participants' proficiency and learning rate in the first block of a simple perceptual dot-motion task were able to predict their willingness to deliberate in a second block. These findings support the hypothesis that agents consider learnability when allocating cognitive control, and comply with a formal model of control allocation that considers the future discounted value of learning on reward.

  • NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model

    ArXiv.org · 2025-08-20

    preprintOpen access

    We introduce Nemotron-Nano-9B-v2, a hybrid Mamba-Transformer language model designed to increase throughput for reasoning workloads while achieving state-of-the-art accuracy compared to similarly-sized models. Nemotron-Nano-9B-v2 builds on the Nemotron-H architecture, in which the majority of the self-attention layers in the common Transformer architecture are replaced with Mamba-2 layers, to achieve improved inference speed when generating the long thinking traces needed for reasoning. We create Nemotron-Nano-9B-v2 by first pre-training a 12-billion-parameter model (Nemotron-Nano-12B-v2-Base) on 20 trillion tokens using an FP8 training recipe. After aligning Nemotron-Nano-12B-v2-Base, we employ the Minitron strategy to compress and distill the model with the goal of enabling inference on up to 128k tokens on a single NVIDIA A10G GPU (22GiB of memory, bfloat16 precision). Compared to existing similarly-sized models (e.g., Qwen3-8B), we show that Nemotron-Nano-9B-v2 achieves on-par or better accuracy on reasoning benchmarks while achieving up to 6x higher inference throughput in reasoning settings like 8k input and 16k output tokens. We are releasing Nemotron-Nano-9B-v2, Nemotron-Nano12B-v2-Base, and Nemotron-Nano-9B-v2-Base checkpoints along with the majority of our pre- and post-training datasets on Hugging Face.

  • Adapting to loss: A computational model of grief.

    Psychological Review · 2025-05-27 · 4 citations

    articleOpen accessSenior author

    ? And why is it sometimes prolonged enough to be clinically impairing? Using the framework of reinforcement learning with memory replay, we offer answers to these questions and suggest, counterintuitively, that grief may function to maximize future reward. That is, grieving may help to unlearn old habits so that alternative sources of reward can be found. We additionally perform a set of simulations that identify and explore optimal grieving parameters and use our model to account for empirical phenomena such as individual differences in human grief trajectories. (PsycInfo Database Record (c) 2026 APA, all rights reserved).

  • Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers

    ArXiv.org · 2025-05-19

    preprintOpen access

    We present causal head gating (CHG), a scalable method for interpreting the functional roles of attention heads in transformer models. CHG learns soft gates over heads and assigns them a causal taxonomy - facilitating, interfering, or irrelevant - based on their impact on task performance. Unlike prior approaches in mechanistic interpretability, which are hypothesis-driven and require prompt templates or target labels, CHG applies directly to any dataset using standard next-token prediction. We evaluate CHG across multiple large language models (LLMs) in the Llama 3 model family and diverse tasks, including syntax, commonsense, and mathematical reasoning, and show that CHG scores yield causal, not merely correlational, insight validated via ablation and causal mediation analyses. We also introduce contrastive CHG, a variant that isolates sub-circuits for specific task components. Our findings reveal that LLMs contain multiple sparse task-sufficient sub-circuits, that individual head roles depend on interactions with others (low modularity), and that instruction following and in-context learning rely on separable mechanisms.

  • Learning expectations shape cognitive control allocation

    Proceedings of the National Academy of Sciences · 2025-10-27 · 1 citations

    articleOpen accessSenior author

    Current models frame the allocation of cognitive control as a process of expected utility maximization. The benefits of a candidate control signal are weighed against its costs (e.g., opportunity costs). Recent theorizing has found that, despite promoting the counterintuitive behavior of longer deliberation, which is less rewarding in the short term, it is nevertheless normative to account for the value of learning when determining control allocation. Here, we sought to test this proposal by examining whether people were willing to allocate greater control and thereby expend greater effort (e.g., deliberate for longer) when they perceived a task to be learnable compared to when they did not. We found that participants' proficiency and learning rate in the first block of a simple perceptual dot-motion task were able to predict their willingness to deliberate in a second block. These findings support the hypothesis that agents consider learnability when allocating cognitive control, and comply with a formal model of control allocation that considers the future discounted value of learning on reward.

  • Understanding Task Representations in Neural Networks via Bayesian Ablation

    arXiv (Cornell University) · 2025-05-19

    preprintOpen access

    Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

  • Skill Depreciation during Unemployment: Evidence from Panel Data<sup>†</sup>

    American Economic Journal Applied Economics · 2025-06-26 · 3 citations

    article1st authorCorresponding

    We examine the depreciation of skills among unemployed German workers using a panel of skill measures linked to administrative data. Both the reemployment hazard and reemployment earnings steadily decline with unemployment duration. Indicators of depression and loneliness also rise substantially. However, we find no decline in a wide range of cognitive and noncognitive skills while workers remain unemployed. We find the same pattern in a panel of American workers. The results imply that skill depreciation in general human capital is unlikely to be a major explanation for observed duration dependence in reemployment outcomes. (JEL E24, E32, J22, J24, J31, J64)

  • Prenatal exposure to polycyclic aromatic hydrocarbons, reduced hippocampal subfield volumes, and word reading

    Developmental Cognitive Neuroscience · 2025-01-11 · 3 citations

    articleOpen access

    Reading difficulties and exposure to air pollution are both disproportionately high among youth living in economically disadvantaged contexts. Critically, variance in reading skills in youth living in higher socioeconomic status (SES) contexts largely derives from genetic factors, whereas environmental factors explain more of the variance in reading skills among youth living in lower SES contexts. Although reading research has focused closely on the psychosocial environment, little focus has been paid to the effects of the chemical environment. In this study, we measured prenatal exposure to a common air pollutant, polycyclic aromatic hydrocarbons (PAH), via the presence (versus absence) of PAH-DNA adducts in maternal blood during the third trimester of pregnancy. We examined the impact of prenatal PAH exposure on adolescent hippocampal subfield volume and on word reading in a sample of youth followed prospectively since birth (N=165). Compared to those without prenatal exposure, those with detectable PAH-DNA adducts (n=63) exhibited significantly smaller hippocampal volumes (CA2/3 subfield, t = -2.413, p <.05), which was associated with worse pseudoword reading ( t =2.346, p <.05). Ex p loratory mediation analyses showed a significant effect of PAH on pseudoword reading through CA2/3 volume ( p =.028), suggesting that prenatal PAH exposure affects hippocampal volume with downstream effects on reading ability. • Youth with prenatal PAH exposure had smaller left hippocampal CA2/3 volume; • Smaller left CA2/3 volume was associated with worse pseudoword reading; • Prenatal PAH influenced pseudoword reading through effects on left CA2/3 volume; • PAH exposure was associated with reduced cortical thickness in reading circuit; • Reading circuit cortical thickness was not associated with word reading skills.

  • Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization

    eLife · 2024-03-21

    preprintOpen accessSenior author

    Abstract Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization— successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid cell code (e.g., in the entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), that we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in the grid cell code, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how the grid cell code in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.

Recent grants

Frequent coauthors

  • Cameron S. Carter

    University of California, Davis

    113 shared
  • Sebastian Musslick

    Osnabrück University

    78 shared
  • Douglas C. Noll

    University of Michigan–Ann Arbor

    78 shared
  • Amitai Shenhav

    University of California System

    69 shared
  • Todd S. Braver

    69 shared
  • Matthew Botvinick

    66 shared
  • Samuel M. McClure

    65 shared
  • Deanna M. Barch

    Washington University in St. Louis

    65 shared

Education

  • Ph.D., Psychology

    Carnegie Mellon University

    1990
  • Internship and Residency, Psychiatry and Behavioral Sciences

    Stanford University

    1989
  • M.D., Medical School

    University of Pennsylvania

    1983
  • B.A., Philosophy and Biology

    Yale University

    1977

Awards & honors

  • American Academy of Arts and Sciences (elected)
  • APS William James Award
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jonathan Cohen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup