Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Tal Linzen

Tal Linzen

· Assistant Professor of Linguistics and Data ScienceVerified

New York University · Chemistry

Active 2002–2026

h-index41
Citations9.4k
Papers230122 last 5y
Funding$953k
See your match with Tal Linzen — sign in to PhdFit.Sign in

About

Tal Linzen is a researcher whose work focuses on understanding the linguistic capabilities of neural language models and their relation to human language processing. His research explores how models learn syntactic constraints, semantic representations, and generalization patterns, often through the manipulation of training data and the analysis of in-context learning. Linzen's contributions include investigating the emergence of linguistic biases in large language models, evaluating their reasoning abilities, and examining their capacity for compositional generalization and semantic understanding. His work also involves developing benchmarks and methodologies to assess the linguistic and cognitive properties of language models, with an emphasis on how these models compare to human language processing. Linzen's research spans multiple aspects of computational linguistics, psycholinguistics, and artificial intelligence, aiming to bridge the gap between machine learning models and human linguistic behavior.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Natural Language Processing
  • Mathematics
  • Psychology
  • Programming language
  • Statistics
  • Data science
  • Linguistics

Selected publications

  • The Syntactic Ambiguity Processing Benchmark

    OSF Preprints (OSF Preprints) · 2026-02-04

    other

    Materials, raw data, analysis scripts and results from Timkey et al. (2025) are in the Timkey_et_al_2025_eyetracking folder. Materials and raw data from Huang et al. (2024) are in the Huang_et_al_2024_spr folder. For analysis scripts and result output from Huang et al. (2024), please see https://github.com/caplabnyu/sapbenchmark

  • BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop

    arXiv (Cornell University) · 2026-02-23

    preprintOpen access

    The goal of the BabyLM is to stimulate new research connections between cognitive modeling and language model pretraining. We invite contributions in this vein to the BabyLM Workshop, which will also include the 4th iteration of the BabyLM Challenge. As in previous years, the challenge features two ``standard'' tracks (Strict and Strict-Small), in which participants must train language models on under 100M or 10M words of data, respectively. This year, we move beyond our previous English-only pretraining datasets with a new Multilingual track, focusing on English, Dutch, and Chinese. For the workshop, we call for papers related to the overall theme of BabyLM, which includes training efficiency, small-scale training datasets, cognitive modeling, model evaluation, and architecture innovation.

  • Why are language models less surprised than humans? Testing the Parse Multiplicity Mismatch Hypothesis

    arXiv (Cornell University) · 2026-05-14

    preprintOpen accessSenior author

    Surprisal theory posits that the processing difficulty of a word is determined by its predictability in context, offering a potential link between human sentence processing and next-word predictions from language models. While language model (LM) surprisals successfully predict reading times in naturalistic text, they systematically underpredict the magnitude of difficulty observed in controlled studies of syntactic ambiguity, particularly in garden path sentences. This mismatch might arise from differences in the computational constraints between humans and LMs. Here we test one such hypothesis, specifically, that LMs may be able to simultaneously consider a greater number of distinct sentence interpretations at once, compared to humans. Using Recurrent Neural Network Grammars (RNNGs) with word-synchronous beam search, we systematically vary the number of simultaneous parses used to compute word surprisal, and then use these surprisals to predict human reading times. Reducing the number of simultaneous active parses indeed increases the magnitude of predicted garden path effects, but not nearly enough to capture the full magnitude of the effects in humans. This suggests that differences in the number of simultaneous parses available to LMs and humans cannot reconcile LM-based surprisal with human sentence processing.

  • Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

    ArXiv.org · 2026-05-13

    articleOpen access

    Data mixing decides how to combine different sources or types of data and is a consequential problem throughout language model training. In pretraining, data composition is a key determinant of model quality; in continual learning and adaptation, it governs what is retained and acquired. Yet existing data mixing methods address only one phase of this lifecycle at a time: some require smaller proxy models tied to a single training phase, others assume a fixed domain set, and continual learning lacks principled guidance altogether. We argue that data mixing is fundamentally an online decision making problem -- one that recurs throughout training and demands a single, unified solution. We introduce OP-Mix (On-Policy Mix), a data mixing algorithm that operates across the entire language model training lifecycle. Our main insight is that candidate data mixtures can be cheaply simulated by interpolating between low-rank adapters trained directly on the current model, eliminating separate proxy models and ensuring the search is always grounded in the model's actual learning dynamics. Across pretraining, continual midtraining, and continual instruction tuning, OP-Mix consistently finds near-optimal mixtures while using a fraction of the compute of the baselines. In pretraining, OP-Mix improves upon training without mixing by 6.3% in average perplexity. For continual learning, OP-Mix matches the performance of both retraining and on-policy distillation while using 66% and 95% less overall compute, respectively. OP-Mix suggests a different view of language model training: not a sequence of distinct phases, but a single continuous process of learning from data.

  • MNE-Python

    Zenodo (CERN European Organization for Nuclear Research) · 2026-04-07

    otherOpen access

    v1.12.0

  • Evaluating In-Context Translation with Synchronous Context-Free Grammar Transduction

    arXiv (Cornell University) · 2026-04-08

    preprintOpen accessSenior author

    Low-resource languages pose a challenge for machine translation with large language models (LLMs), which require large amounts of training data. One potential way to circumvent this data dependence is to rely on LLMs' ability to use in-context descriptions of languages, like textbooks and dictionaries. To do so, LLMs must be able to infer the link between the languages' grammatical descriptions and the sentences in question. Here we isolate this skill using a formal analogue of the task: string transduction based on a formal grammar provided in-context. We construct synchronous context-free grammars which define pairs of formal languages designed to model particular aspects of natural language grammar, morphology, and written representation. Using these grammars, we measure how well LLMs can translate sentences from one formal language into another when given both the grammar and the source-language sentence. We vary the size of the grammar, the lengths of the sentences, the syntactic and morphological properties of the languages, and their written script. We note three key findings. First, LLMs' translation accuracy decreases markedly as a function of grammar size and sentence length. Second, differences in morphology and written representation between the source and target languages can strongly diminish model performance. Third, we examine the types of errors committed by models and find they are most prone to recall the wrong words from the target language vocabulary, hallucinate new words, or leave source-language words untranslated.

  • BabyLM Turns 4 and Goes Multilingual: Call for Papers for the 2026 BabyLM Workshop

    ArXiv.org · 2026-01-01

    articleOpen access

    BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 4th BabyLM competition. As in previous years, we call for participants in the data-efficient pretraining challenge in the general track. This year, we also offer a new track: Multilingual. We also call for papers outside the competition in any relevant areas. These include training efficiency, cognitively plausible research, weak model evaluation, and more.

  • Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough

    arXiv (Cornell University) · 2026-02-04

    articleOpen access

    Using temporarily ambiguous garden-path sentences ("While the team trained the striker wondered ...") as a test case, we present a latent-process mixture model of human reading behavior across four different reading paradigms (eye tracking, uni- and bidirectional self-paced reading, Maze). The model distinguishes between garden-path probability, garden-path cost, and reanalysis cost, and yields more realistic processing cost estimates by taking into account trials with inattentive reading. We show that the model is able to reproduce empirical patterns with regard to rereading behavior, comprehension question responses, and grammaticality judgments. Cross-validation reveals that the mixture model also has better predictive fit to human reading patterns and end-of-trial task data than a mixture-free model based on GPT-2-derived surprisal values. We discuss implications for future work.

  • Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough

    Open MIND · 2026-02-04

    preprint

    Using temporarily ambiguous garden-path sentences ("While the team trained the striker wondered ...") as a test case, we present a latent-process mixture model of human reading behavior across four different reading paradigms (eye tracking, uni- and bidirectional self-paced reading, Maze). The model distinguishes between garden-path probability, garden-path cost, and reanalysis cost, and yields more realistic processing cost estimates by taking into account trials with inattentive reading. We show that the model is able to reproduce empirical patterns with regard to rereading behavior, comprehension question responses, and grammaticality judgments. Cross-validation reveals that the mixture model also has better predictive fit to human reading patterns and end-of-trial task data than a mixture-free model based on GPT-2-derived surprisal values. We discuss implications for future work.

  • MNE-Python

    Open MIND · 2026-01-01

    softwareOpen access

    v1.12.1

Recent grants

Frequent coauthors

Labs

  • Computation and Psycholinguistics LabPI

    What are the mental representations that constitute our knowledge of language? How do we use them to understand and produce language? How can we create computational systems that learn language as efficiently and robustly as humans?

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Tal Linzen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup