Meghan Sumner

· Associate Professor of LinguisticsVerified

Stanford University · Linguistics

Active 2003–2026

h-index10

Citations1.1k

Papers465 last 5y

Funding$880k

Faculty page

See your match with Meghan Sumner — sign in to PhdFit.Sign in

Research topics

Computer Science
Speech recognition
Psychology
Audiology
Cognitive psychology
Artificial Intelligence
Natural Language Processing
Communication
Medicine
Neuroscience

Selected publications

Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition
ArXiv.org · 2026-01-11
articleOpen access
In speech language modeling, two architectures dominate the frontier: the Transformer and the Conformer. However, it remains unknown whether their comparable performance stems from convergent processing strategies or distinct architectural inductive biases. We introduce Architectural Fingerprinting, a probing framework that isolates the effect of architecture on representation, and apply it to a controlled suite of 24 pre-trained encoders (39M-3.3B parameters). Our analysis reveals divergent hierarchies: Conformers implement a "Categorize Early" strategy, resolving phoneme categories 29% earlier in depth and speaker gender by 16% depth. In contrast, Transformers "Integrate Late," deferring phoneme, accent, and duration encoding to deep layers (49-57%). These fingerprints suggest design heuristics: Conformers' front-loaded categorization may benefit low-latency streaming, while Transformers' deep integration may favor tasks requiring rich context and cross-utterance normalization.
Publisher OA PDF
Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition
arXiv (Cornell University) · 2026-01-11
preprintOpen access
In speech language modeling, two architectures dominate the frontier: the Transformer and the Conformer. However, it remains unknown whether their comparable performance stems from convergent processing strategies or distinct architectural inductive biases. We introduce Architectural Fingerprinting, a probing framework that isolates the effect of architecture on representation, and apply it to a controlled suite of 24 pre-trained encoders (39M-3.3B parameters). Our analysis reveals divergent hierarchies: Conformers implement a "Categorize Early" strategy, resolving phoneme categories 29% earlier in depth and speaker gender by 16% depth. In contrast, Transformers "Integrate Late," deferring phoneme, accent, and duration encoding to deep layers (49-57%). These fingerprints suggest design heuristics: Conformers' front-loaded categorization may benefit low-latency streaming, while Transformers' deep integration may favor tasks requiring rich context and cross-utterance normalization.
Publisher DOI
Indexical weight is distributed: Evidence from social evaluations of /s/
Proceedings of the Linguistic Society of America · 2026-05-08
articleOpen access
Though it is well established that listeners can infer social meanings from linguistic variation, less is known about how listeners use properties of the linguistic signal to form such indexical relationships. We examine this question using the well-documented association between /s/ center of gravity (CoG) and perceived masculinity. Using a matched guise paradigm, we ask how two properties of the phonetic signal, the F2 of the vowel following /s/ and speaker voice, modulate listeners' masculinity evaluations of sibilant acoustics. Furthermore, we investigate whether this modulation depends on the variability of the /s/ tokens listeners hear within the experimental context by comparing listeners who heard a single invariant /s/ token per categorical /s/ CoG condition (fronted, mid, backed) with those who heard /s/ tokens that varied with the phonological environment. Bayesian modeling confirmed that high-CoG (fronted) /s/ is typically perceived as less masculine than low-CoG (backed) /s/, replicating prior findings. Critically, masculinity judgments were primarily influenced by speaker voice, with /s/ CoG and F2 of the following vowel serving as secondary, modulating cues to speaker masculinity. These findings suggest that social evaluation reflects integration of the full available signal rather than extraction of a single variable.
Publisher OA PDF DOI
In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
ArXiv.org · 2025-05-20
preprintOpen access
Human listeners readily adjust to unfamiliar speakers and language varieties through exposure, but do these adaptation benefits extend to state-of-the-art spoken language models? We introduce a scalable framework that allows for in-context learning (ICL) in Phi-4 Multimodal using interleaved task prompts and audio-text pairs, and find that as few as 12 example utterances (~50 seconds) at inference time reduce word error rates by a relative 19.7% (1.2 pp.) on average across diverse English corpora. These improvements are most pronounced in low-resource varieties, when the context and target speaker match, and when more examples are provided--though scaling our procedure yields diminishing marginal returns to context length. Overall, we find that our novel ICL adaptation scheme (1) reveals a similar performance profile to human listeners, and (2) demonstrates consistent improvements to automatic speech recognition (ASR) robustness across diverse speakers and language backgrounds. While adaptation succeeds broadly, significant gaps remain for certain varieties, revealing where current models still fall short of human flexibility. We release our prompts and code on GitHub.
Publisher OA PDF DOI
Sublexical ARTifacts: Bottom-up Interference in a Lexical Category Search
Underline Science Inc. · 2025-06-18
otherOpen accessSenior author
How listeners adapt to unfamiliar talkers and accents is a central question in psycholinguistics. In this study, we explored how listeners dynamically shift mappings from acoustic information to mental representations after hearing a new talker via novel eye-tracking methods. We tested a prediction from Adaptive Resonance Theory (ART) that an anomaly in the signal (in this case, a change in talker) increases the influence of bottom-up relative to top-down information, creating an environment where sublexical competitors (e.g. 'Arch' within 'Archer') would be more likely interfere with lexical access for the target. In two experiments (Exp. 1: General American English [GA] talkers; Exp. 2: GA and Spanish-accented [SP] talkers), this prediction was supported via analyses of accuracy, latency, and gaze. In Exp. 2, we found that the effect replicated but did not differ based on the accent of the talker. The data suggest new paths forward in speech adaptation research.
Publisher DOI
In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
2025-01-01 · 2 citations
articleOpen access
Publisher OA PDF DOI
The Text Aphasia Battery (TAB): A Clinically-Grounded Benchmark for Aphasia-Like Deficits in Language Models
ArXiv.org · 2025-11-25
preprintOpen access
Large language models (LLMs) have emerged as a candidate "model organism" for human language, offering an unprecedented opportunity to study the computational basis of linguistic disorders like aphasia. However, traditional clinical assessments are ill-suited for LLMs, as they presuppose human-like pragmatic pressures and probe cognitive processes not inherent to artificial architectures. We introduce the Text Aphasia Battery (TAB), a text-only benchmark adapted from the Quick Aphasia Battery (QAB) to assess aphasic-like deficits in LLMs. The TAB comprises four subtests: Connected Text, Word Comprehension, Sentence Comprehension, and Repetition. This paper details the TAB's design, subtests, and scoring criteria. To facilitate large-scale use, we validate an automated evaluation protocol using Gemini 2.5 Flash, which achieves reliability comparable to expert human raters (prevalence-weighted Cohen's kappa = 0.255 for model--consensus agreement vs. 0.286 for human--human agreement). We release TAB as a clinically-grounded, scalable framework for analyzing language deficits in artificial systems.
Publisher OA PDF DOI
Talker-specificity beyond the lexicon: Recognition memory for spoken sentences
Psychonomic Bulletin & Review · 2025-08-27
articleSenior author
Publisher DOI
Talker-based asymmetries in memory for spoken sentences
The Journal of the Acoustical Society of America · 2025-04-01
articleSenior author
It is well-established that memory plays a central role in the human ability to understand speech, but not all experiences with speech are remembered equally well. One hypothesis about how these asymmetries emerge is that representation strength depends on how listeners allocate cognitive resources to the speech signal, partially based on the social characteristics of the talker. To test this, we conducted three recognition memory experiments with 12 diverse, but roughly equally non-standard talkers (i.e., no speakers of mainstream American English). We manipulated attention at encoding, as well as retrieval modality. Participants heard spoken sentences at study with different test blocks: auditorily presented sentences (Exp. 1), orthographically presented sentences (Exp. 2), and images (Exp. 3). In all three experiments, memory was stronger in Full than Divided Attention. Crucially, we also found that memory performance depended heavily on the talker and that talker interacted with voice of repetition (Exp. 1) and attention (Exps. 1, 2, and 3) in complex ways. These results point to a highly dynamic, context-sensitive network of speech representations where encoding and recognition behaviors are patterned by resource allocation in addition to frequency and typicality. We discuss implications for understanding voice-based biases in everyday situations.
Publisher DOI
Speech patterns during memory recall relates to early tau burden across adulthood
Alzheimer s & Dementia · 2024 · 22 citations
- Psychology
- Audiology
- Cognitive psychology
INTRODUCTION: Early cognitive decline may manifest in subtle differences in speech. METHODS: We examined 238 cognitively unimpaired adults from the Framingham Heart Study (32-75 years) who completed amyloid and tau PET imaging. Speech patterns during delayed recall of a story memory task were quantified via five speech markers, and their associations with global amyloid status and regional tau signal were examined. RESULTS: Total utterance time, number of between-utterance pauses, speech rate, and percentage of unique words significantly correlated with delayed recall score although the shared variance was low (2%-15%). Delayed recall score was not significantly different between β-amyoid-positive (Aβ+) and -negative (Aβ-) groups and was not associated with regional tau signal. However, longer and more between-utterance pauses, and slower speech rate were associated with increased tau signal across medial temporal and early neocortical regions. DISCUSSION: Subtle speech changes during memory recall may reflect cognitive impairment associated with early Alzheimer's disease pathology. HIGHLIGHTS: Speech during delayed memory recall relates to tau PET signal across adulthood. Delayed memory recall score was not associated with tau PET signal. Speech shows greater sensitivity to detecting subtle cognitive changes associated with early tau accumulation. Our cohort spans adulthood, while most PET imaging studies focus on older adults.
Publisher OA PDF DOI

Recent grants

NIH Grant F32MH068204
NIH · $131k · 2006
The perception, representation, and use of non-native voicing cues
NSF · $349k · 2007–2011
Understanding the perception and recognition of spoken words: Effects of phonetics, phonological variation, and speech mode
NSF · $400k · 2012–2018

Frequent coauthors

Arthur G. Samuel
Basque Center on Cognition, Brain and Language
7 shared
Seung Kyung Kim
University of Utah
6 shared
Ed King
Stanford University
5 shared
Marisa Tice
Stanford University
4 shared
Kevin B. McGowan
4 shared
William Clapp
Stanford University
3 shared
Marie-Catherine de Marneffe
3 shared
John M. Tomlinson
Humboldt-Universität zu Berlin
3 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Meghan Sumner

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you