Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Kyle Mahowald

Kyle Mahowald

· Assistant ProfessorVerified

University of Texas at Austin · Linguistics

Active 2010–2026

h-index27
Citations3.4k
Papers11679 last 5y
Funding$175k
See your match with Kyle Mahowald — sign in to PhdFit.Sign in

About

Kyle Mahowald is an Assistant Professor in the College of Liberal Arts at the University of Texas at Austin. His research focuses on computational linguistics, psycholinguistics, and quantitative methods in linguistics. He is also involved in natural language processing, large language models, and cognitive science, contributing to the understanding of language processing and computational modeling within these fields.

Research topics

  • Computer Science
  • Psychology
  • Cognitive science
  • Natural Language Processing
  • Linguistics
  • Artificial Intelligence
  • Social psychology
  • Cognitive psychology
  • Machine Learning
  • Philosophy
  • Information Retrieval
  • Neuroscience
  • Epistemology
  • Mathematics
  • Statistics
  • Developmental psychology
  • Econometrics

Selected publications

  • Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

    arXiv (Cornell University) · 2026-01-08

    preprintOpen accessSenior author

    Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.

  • Documents and Material

    OSF Preprints (OSF Preprints) · 2026-04-21

    other
  • Registered Report: Knowledge vs. Ignorance

    OSF Preprints (OSF Preprints) · 2026-03-27

    other
  • LLMs and people both learn to form conventions -- just not with each other

    arXiv (Cornell University) · 2026-02-09

    articleOpen access

    Humans align to one another in conversation -- adopting shared conventions that ease communication. We test whether LLMs form the same kinds of conventions in a multimodal communication game. Both humans and LLMs display evidence of convention-formation (increasing the accuracy and consistency of their turns while decreasing their length) when communicating in same-type dyads (humans with humans, AI with AI). However, heterogenous human-AI pairs fail -- suggesting differences in communicative tendencies. In Experiment 2, we ask whether LLMs can be induced to behave more like human conversants, by prompting them to produce superficially humanlike behavior. While the length of their messages matches that of human pairs, accuracy and lexical overlap in human-LLM pairs continues to lag behind that of both human-human and AI-AI pairs. These results suggest that conversational alignment requires more than just the ability to mimic previous interactions, but also shared interpretative biases toward the meanings that are conveyed.

  • Cross-cultural structures of personal name systems reflect general communicative principles

    Nature Communications · 2026-01-19

    articleOpen accessSenior author

    The structure of personal names appears to differ widely across cultures. Using census records and historical datasets, we present an information-theoretic analysis of name systems that shows how the scope of this variation is more constrained than it might appear. We identify two constraints name systems must satisfy: encoding large numbers of identities, and ensuring these encodings are usable. We show that, historically, the world's languages satisfied these constraints using structurally similar, near-optimal codes. They did so by combining sets of name-specific words with existing vocabulary items, allowing unlimited numbers of identifiers to be created while keeping vocabulary sizes stable. Today, many natural name systems have been transformed into official codes based on hereditary patronyms. We show how, globally, these changes differentially altered the information structure of codes, leading to cross-cultural differences in the way names function as individuators that can have tangible effects in domains like scientific publishing.

  • ResearchBox 4207, 'A suite of LMs comprehend puzzle statements as well as humans ', https://researchbox.org/4207

    Zenodo (CERN European Organization for Nuclear Research) · 2026-03-11

    otherOpen access

    Box title: 'A suite of LMs comprehend puzzle statements as well as humans ' Reference: Supantho Rakshit, Jennifer Hu and Kyle Mahowald, Adele E. Goldberg, 'A suite of LMs comprehend puzzle statements as well or better than humans ', Open Mindhttps://www.dropbox.com/scl/fi/ve2xchi0mlf96me4nemhd/2026_A-suite-of-LMs_final.pdf?rlkey=1ufz5d0fa4smuvqfrfvnniilo&dl=0Note: this backup was created automatically by a ResearchBox bot

  • A Suite of LMs Comprehend Puzzle Statements as Well or Better Than Humans

    Open Mind · 2026-01-01

    articleOpen access

    Abstract This paper reexamines a recent claim that Large Language Models lag behind humans in language comprehension on what were described as minimally complex statements. We argue that human performance was overestimated and LM performance, underestimated. Moreover, both people and lower-performing LMs are disproportionately challenged by queries involving potentially appropriate inferences, suggesting shared pragmatic sensitivity rather than model-specific deficits. Analysis of more sensitive log probabilities of Llama-2-70B demonstrate ceiling-level accuracy and pragmatic sensitivity. A separate group of LM grammaticality judgments previously characterized as incorrect are shown to correlate with human judgments, while certain reasoning models approximate idealized judgments when prompted to respond as an expert generative syntactician. Overall, the findings suggest that apparent deficits in LM performance may reflect task design, evaluation choices, and assumptions about human performance, rather than deficiencies in current models.

  • LLMs and people both learn to form conventions -- just not with each other

    Open MIND · 2026-02-09

    preprint

    Humans align to one another in conversation -- adopting shared conventions that ease communication. We test whether LLMs form the same kinds of conventions in a multimodal communication game. Both humans and LLMs display evidence of convention-formation (increasing the accuracy and consistency of their turns while decreasing their length) when communicating in same-type dyads (humans with humans, AI with AI). However, heterogenous human-AI pairs fail -- suggesting differences in communicative tendencies. In Experiment 2, we ask whether LLMs can be induced to behave more like human conversants, by prompting them to produce superficially humanlike behavior. While the length of their messages matches that of human pairs, accuracy and lexical overlap in human-LLM pairs continues to lag behind that of both human-human and AI-AI pairs. These results suggest that conversational alignment requires more than just the ability to mimic previous interactions, but also shared interpretative biases toward the meanings that are conveyed.

  • When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

    arXiv (Cornell University) · 2026-04-07

    preprintOpen access

    Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere to their introspective reasoning are central challenges for trustworthy deployment. To study this, we introduce the Graded Color Attribution (GCA) dataset, a controlled benchmark designed to elicit decision rules and evaluate participant faithfulness to these rules. GCA consists of line drawings that vary pixel-level color coverage across three conditions: world-knowledge recolorings, counterfactual recolorings, and shapes with no color priors. Using GCA, both VLMs and human participants establish a threshold: the minimum percentage of pixels of a given color an object must have to receive that color label. We then compare these rules with their subsequent color attribution decisions. Our findings reveal that models systematically violate their own introspective rules. For example, GPT-5-mini violates its stated introspection rules in nearly 60\% of cases on objects with strong color priors. Human participants remain faithful to their stated rules, with any apparent violations being explained by a well-documented tendency to overestimate color coverage. In contrast, we find that VLMs are excellent estimators of color coverage, yet blatantly contradict their own reasoning in their final responses. Across all models and strategies for eliciting introspective rules, world-knowledge priors systematically degrade faithfulness in ways that do not mirror human cognition. Our findings challenge the view that VLM reasoning failures are difficulty-driven and suggest that VLM introspective self-knowledge is miscalibrated, with direct implications for high-stakes deployment.

  • Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

    ArXiv.org · 2026-01-08

    articleOpen accessSenior author

    Large vision-language models (VLMs) are highly capable, yet often hallucinate by favoring textual prompts over visual evidence. We study this failure mode in a controlled object-counting setting, where the prompt overstates the number of objects in the image (e.g., asking a model to describe four waterlilies when only three are present). At low object counts, models often correct the overestimation, but as the number of objects increases, they increasingly conform to the prompt regardless of the discrepancy. Through mechanistic analysis of three VLMs, we identify a small set of attention heads whose ablation substantially reduces prompt-induced hallucinations (PIH) by at least 40% without additional training. Across models, PIH-heads mediate prompt copying in model-specific ways. We characterize these differences and show that PIH ablation increases correction toward visual evidence. Our findings offer insights into the internal mechanisms driving prompt-induced hallucinations, revealing model-specific differences in how these behaviors are implemented.

Recent grants

Frequent coauthors

  • Edward Gibson

    46 shared
  • Evelina Fedorenko

    Massachusetts Institute of Technology

    40 shared
  • Richard Futrell

    30 shared
  • Isabelle Dautriche

    Laboratoire de Psychologie Cognitive

    18 shared
  • Tiago Pimentel

    14 shared
  • Peter Graff

    14 shared
  • Ryan Cotterell

    13 shared
  • Jeremy Hartman

    13 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Kyle Mahowald

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup