Claire Bowern

· Professor & Coordinator (NAIS Certificate)Verified

Yale University · Department of Linguistics

Active 1999–2026

h-index42

Citations7.1k

Papers22261 last 5y

Funding$1.5M

Faculty page Lab page Website

See your match with Claire Bowern — sign in to PhdFit.Sign in

About

Professor Claire Bowern is a faculty member at Yale University with a focus on linguistics. Her research encompasses language documentation, linguistic fieldwork, and the study of language evolution and cultural evolution. She has advised numerous students and postdoctoral researchers, contributing to the academic community through mentorship and scholarly publications. Her work is recognized for its depth in understanding language morphosyntax and the documentation of diverse languages.

Research topics

Computer Science
Linguistics
Philosophy
Psychology
Speech recognition
Programming language
Artificial Intelligence
Sociology
Biology
Machine Learning
Acoustics
Geography
Anthropology
Cognitive psychology
Physics

Selected publications

D-PLACE aggregated dataset
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-13
datasetOpen access
Cite the source of the dataset as: Kathryn R. Kirby, Russell D. Gray, Simon J. Greenhill, Fiona M. Jordan, Stephanie Gomes-Ng, Hans-Jörg Bibiko, Damián E. Blasi, Carlos A. Botero, Claire Bowern, Carol R. Ember, Dan Leehr, Bobbi S. Low, Joe McCarter, William Divale, and Michael C. Gavin. (2016). D-PLACE: A Global Database of Cultural, Linguistic and Environmental Diversity. PLoS ONE, 11(7): e0158391. doi:10.1371/journal.pone.0158391.
Publisher DOI
Kinbank: A global database of kinship terminology
Zenodo (CERN European Organization for Nuclear Research) · 2026-02-06
datasetOpen access
The data repository for the Kinbank dataset
Publisher DOI
Demographic shifts, inter-group contact and environmental conditions drive language extinction and diversification
Proceedings of the Royal Society B Biological Sciences · 2026-01-28
articleOpen access
Humans collectively use thousands of languages. The number of languages in a region (i.e. 'richness') varies widely. Empirical research has identified social, environmental, geographic and demographic factors associated with language richness. However, our understanding of causal mechanisms and variation in their effects over space has been limited by prior analyses focusing on correlation and assuming stationarity. Here we use process-based, spatially explicit stochastic models to simulate the emergence, expansion, contraction, fragmentation and extinction of language ranges. We varied parameter settings in these computer-simulated experiments to evaluate the extent to which different processes reproduce observed patterns of language richness in North America. We find that the majority of spatial variation in language richness is explained by models in which environmental and social constraints determine population density, random shocks alter population sizes more frequently at higher population densities, and population shocks are more frequently negative than positive. Language diversification occurs when populations split after reaching size limits, and when ranges fragment due to population contractions following negative shocks or due to contact with other groups expanding following positive shocks. These findings support theories arguing that environmental and social conditions, constraints on group sizes, outcomes of contact and shifting demographics all shape language richness.
Publisher DOI
A Tutorial for Video in Spoken Language Documentation
Edinburgh Research Explorer (University of Edinburgh) · 2026-01-01
articleOpen accessSenior author
Spoken language always goes along with meaningful visible behavior, such as gesture and eye gaze. But while language use is multimodal, published recommendations and formal training in spoken language documentation tend to focus almost exclusively on the audio part of the signal. Therefore, this tutorial provides a practical guide to using video as part of a spoken language documentation project. We motivate why these projects should consider recording video, and we then describe the equipment needs, recording setups, and post-processing workflow required for collecting transcribable video. We also discuss the unique ethical/privacy concerns raised by video recording and archiving. Overall, our goal is to centralize and formalize the recommendations about video that have long circulated in oral form, or as grey literature, in documentation circles. The scripts in the supplementary materials are maintained <a href="https://urldefense.com/v3/__https://github.com/amaliaskilton/auto-ffmpeg__;!!PvDODwlR4mBZyAb0!RZQFbNOEre7MeT8vM-g_GcQq3N0JfiFc6Hif7Cf7NWfqL4DHxVpW4oOMYr_SnmAC3rtW_b6iMNJr_rqou2k$">here</a>.
Publisher OA PDF
Comparing Phonological Feature Sets for Low-Resource ASR
University of Massachusetts (UMass) Amherst · 2026-03-14
articleOpen accessSenior author
In this paper, we explore an alternative ASR framework in which phonological features are predicted as an explicit intermediate representation, rather than predicting phones directly. Because feature systems encode cross-linguistically meaningful structure, this intermediate representation can reduce sample complexity by constraining what must be learned from limited data, while also enabling rapid adaptation to new languages through changes to the phone-to-feature mapping rather than retraining the model. As a result, this approach is particularly well suited to low-resource settings. We retrained Phonet models on two different feature sets to see the extent to which specific theories of phonological features facilitate better phoneme recognition, using a low-resourced language (Yan-nhangu, Pama-Nyungan) as a testing ground for performance. We use a naïve greedy decoding strategy to isolate the effect of feature set choice, and find that IPA features lead to the best transcription accuracy, followed closely by a featureless baseline.
Publisher DOI
Phlorest phylogeny derived from Bouckaert et al. 2018 'The origin and expansion of Pama–Nyungan languages across Australia'
Zenodo (CERN European Organization for Nuclear Research) · 2025-11-11
datasetOpen access
Cite the source of the dataset as: Bouckaert RR, Bowern C & Atkinson QD. 2018. The origin and expansion of Pama–Nyungan languages across Australia. Nature Ecology and Evolution. 2: 741–749
Publisher DOI
Phlorest phylogeny derived from Bowern & Atkinson 2012 'Computational phylogenetics and the internal structure of Pama-Nyungan'
Zenodo (CERN European Organization for Nuclear Research) · 2025-11-11
datasetOpen access1st authorCorresponding
Cite the source of the dataset as: Bowern C & Atkinson QD. 2012. Computational phylogenetics and the internal structure of Pama-Nyungan. Language, 88(4), 817-845.
Publisher DOI
Australian archaeolinguistics
Oxford University Press eBooks · 2025-07-22
book-chapter1st authorCorresponding
Abstract This chapter discusses the linguistic, genetic, and archaeological stories of the Indigenous peoples of the area now known as Australia (the southern portion of Sahul). When attempting to synthesize information from genetics, archaeology, and language for the deep past of Sahul, we are confronted with several seeming contradictions. On the one hand, the picture from genetics emphasizes continuity: rapid and early expansion (above 40,000 years ago), followed by fairly stable regionalism and some subsequent gene flow. The linguistic picture, however, appears to show a heavy disjunction, with one family, Pama-Nyungan, spreading and replacing most of the languages of almost 90% of the continent within the past 7,000 years. The material record shows a combination of stability, regionalism, and shift. This chapter explores some of these questions.
Publisher DOI
Linguistically Informed Tokenization Improves ASR for Underresourced Languages
ArXiv.org · 2025-10-07
preprintOpen accessSenior author
Automatic speech recognition (ASR) is a crucial tool for linguists aiming to perform a variety of language documentation tasks. However, modern ASR systems use data-hungry transformer architectures, rendering them generally unusable for underresourced languages. We fine-tune a wav2vec2 ASR model on Yan-nhangu, a dormant Indigenous Australian language, comparing the effects of phonemic and orthographic tokenization strategies on performance. In parallel, we explore ASR's viability as a tool in a language documentation pipeline. We find that a linguistically informed phonemic tokenization system substantially improves WER and CER compared to a baseline orthographic tokenization scheme. Finally, we show that hand-correcting the output of an ASR model is much faster than hand-transcribing audio from scratch, demonstrating that ASR can work for underresourced languages.
Publisher OA PDF DOI
Diachrony and Diachronica
Diachronica · 2025-05-23 · 2 citations
article1st authorCorresponding
Publisher DOI

Recent grants

Dynamics of Hunter-Gatherer Language Change
NSF · $718k · 2008–2014
Language as a Window on Prehistory
NSF · $382k · 2014–2020
CAREER: Pama-Nyungan Reconstruction and the Prehistory of Australia
NSF · $407k · 2008–2014
The Language of Bardi (BCJ) Precontact Narratives
NSF · $13k · 2008–2012

Frequent coauthors

Russell D. Gray
University of Auckland
51 shared
Simon J. Greenhill
University of Auckland
48 shared
Michael C. Gavin
Colorado State University
46 shared
Damián E. Blasí
45 shared
Kathryn R. Kirby
42 shared
Hannah J. Haynie
Kent State University
32 shared
Fiona M. Jordan
31 shared
Jakob Lesage
29 shared

Labs

Language and Brain LabPI

Education

Ph.D., Linguistics
Harvard University
2004

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Claire Bowern

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you