David Bamman
· Below The Line Associate ProfessorVerifiedUniversity of California, Berkeley · Department of Electrical Engineering and Computer Sciences
Active 2006–2025
About
David Bamman is an associate professor in the School of Information at the University of California, Berkeley. His work focuses on natural language processing (NLP) and cultural analytics, applying NLP and artificial intelligence (AI) to empirical questions in the humanities and social sciences. Bamman's research aims to improve computational methods for underserved domains such as literature, including projects like LitBank and BookNLP, and to develop new empirical approaches for studying literature, film, and culture. Prior to his appointment at Berkeley, he earned his PhD from the School of Computer Science at Carnegie Mellon University and worked as a senior researcher at the Perseus Project at Tufts University. His research has received support from prominent organizations including the National Endowment for the Humanities, the National Science Foundation, the Mellon Foundation, and he is a recipient of an NSF CAREER award.
Research topics
- Artificial Intelligence
- Computer Science
- Natural Language Processing
- Psychology
- Epistemology
- Linguistics
- Social psychology
- History
- Data science
- Mathematics
- Cognitive science
- Philosophy
- Communication
Selected publications
Tell, Don’t Show: Leveraging Language Models’ Abstractive Retellings to Model Literary Themes
2025-01-01 · 1 citations
articleOpen accessSenior authorConventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text.Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to show, don't tell.We propose Retell, a simple, accessible topic modeling approach for literature.Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higherlevel concepts and themes.By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics.To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.
Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes
ArXiv.org · 2025-05-29
preprintOpen accessSenior authorConventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to "show, don't tell." We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.
Measuring the Stories in Contemporary Songs
2025-08-21
articleOpen access1st authorCorrespondingLyric poetry--the poetry of song--is often defined in opposition to narrative. In this work, we examine this relationship by carrying out an empirical study to measure the degree of narrativity present in contemporary songs, using a dataset of popular (Billboard Hot 100) and prestigious (Grammy-nominated) songs spanning 1960-2024. While we might expect the 1960s (with ballad-driven folk singers like Joan Baez, Bob Dylan and Simon & Garfunkel) to be a high-water mark for narrativity, we find the opposite: narrativity has been steadily increasing over this period, largely due to the rise of the strongly narrative genres of hip hop and rap. We also find that it is a marker of prestige for country music, with Grammy-award nominated "Best Country" songs displaying significantly higher narrativity rates than non-nominated songs from the same album.
Culture is Not Trivia: Sociocultural Theory for Cultural NLP
2025-01-01 · 3 citations
articleOpen accessThe field of cultural NLP has recently experienced rapid growth, driven by a pressing need to ensure that language technologies are effective and safe across a pluralistic user base.This work has largely progressed without a shared conception of culture, instead choosing to rely on a wide array of cultural proxies.However, this leads to a number of recurring limitations: coarse national boundaries fail to capture nuanced differences that lay within them, limited coverage restricts datasets to only a subset of usually highly-represented cultures, and a lack of dynamicity results in static cultural benchmarks that do not change as culture evolves.In this position paper, we argue that these methodological limitations are symptomatic of a theoretical gap.We draw on a well-developed theory of culture from sociocultural linguistics to fill this gap by 1) demonstrating in a case study how it can clarify methodological constraints and affordances, 2) offering theoretically-motivated paths forward to achieving cultural competence, and 3) arguing that localization is a more useful framing for the goals of much current work in cultural NLP.
Multimodal Conversation Structure Understanding
ArXiv.org · 2025-05-23
preprintOpen accessSenior authorWhile multimodal large language models (LLMs) excel at dialogue, whether they can adequately parse the structure of conversation -- conversational roles and threading -- remains underexplored. In this work, we introduce a suite of tasks and release TV-MMPC, a new annotated dataset, for multimodal conversation structure understanding. Our evaluation reveals that while all multimodal LLMs outperform our heuristic baseline, even the best-performing model we consider experiences a substantial drop in performance when character identities of the conversation are anonymized. Beyond evaluation, we carry out a sociolinguistic analysis of 350,842 utterances in TVQA. We find that while female characters initiate conversations at rates in proportion to their speaking time, they are 1.2 times more likely than men to be cast as an addressee or side-participant, and the presence of side-participants shifts the conversational register from personal to social.
Measuring the Stories in Contemporary Songs
2025-11-17
book-chapterOpen access1st authorCorrespondingLyric poetry—the poetry of song—is often defined in opposition to narrative. In this work, we examine this relationship by carrying out an empirical study to measure the degree of <em>narrativity</em> present in contemporary songs, using a dataset of popular (Billboard Hot 100) and prestigious (Grammy-nominated) songs spanning 1960–2024. While we might expect the 1960s (with ballad-driven folk singers like Joan Baez, Bob Dylan and Simon & Garfunkel) to be a high-water mark for narrativity, we find the opposite: narrativity has been steadily increasing over this period, largely due to the rise of the strongly narrative genres of hip hop and rap. We also find that it is a marker of prestige for country music, with Grammy-award nominated “Best Country” songs displaying significantly higher narrativity rates than non-nominated songs from the same album.
“Othering” Through War: Depiction of Asians/Asian Americans in U.S. History Textbooks
Educational Researcher · 2025-03-27 · 2 citations
articleOpen accessSenior authorUsing computational methods, we investigate a data set of 874,125 sentences from 30 U.S. history textbooks used in California and Texas schools to consider how they discuss Asians/Asian Americans. Only 1% of all sentences in our sample has any mention of Asians. Most of these sentences focus on Chinese and Japanese, and when individuals are named, they are usually White. The most prevalent topics in which Asians appear are about war. Discussions of wars are a centerpiece of history textbooks, but the dominance of such narratives is especially high for Asians relative to other ethno-cultural groups. The sentiment of verbs used to describe Asians is strikingly negative. Asians are described more negatively than others in both war and nonwar contexts.
Measuring the Stories in Contemporary Songs
2025-11-04
article1st authorCorrespondingLyric poetry--the poetry of song--is often defined in opposition to narrative. In this work, we examine this relationship by carrying out an empirical study to measure the degree of narrativity present in contemporary songs, using a dataset of popular (Billboard Hot 100) and prestigious (Grammy-nominated) songs spanning 1960-2024. While we might expect the 1960s (with ballad-driven folk singers like Joan Baez, Bob Dylan and Simon &amp; Garfunkel) to be a high-water mark for narrativity, we find the opposite: narrativity has been steadily increasing over this period, largely due to the rise of the strongly narrative genres of hip hop and rap. We also find that it is a marker of prestige for country music, with Grammy-award nominated "Best Country" songs displaying significantly higher narrativity rates than non-nominated songs from the same album.
Culture is Not Trivia: Sociocultural Theory for Cultural NLP
arXiv (Cornell University) · 2025-02-17
preprintOpen accessThe field of cultural NLP has recently experienced rapid growth, driven by a pressing need to ensure that language technologies are effective and safe across a pluralistic user base. This work has largely progressed without a shared conception of culture, instead choosing to rely on a wide array of cultural proxies. However, this leads to a number of recurring limitations: coarse national boundaries fail to capture nuanced differences that lay within them, limited coverage restricts datasets to only a subset of usually highly-represented cultures, and a lack of dynamicity results in static cultural benchmarks that do not change as culture evolves. In this position paper, we argue that these methodological limitations are symptomatic of a theoretical gap. We draw on a well-developed theory of culture from sociocultural linguistics to fill this gap by 1) demonstrating in a case study how it can clarify methodological constraints and affordances, 2) offering theoretically-motivated paths forward to achieving cultural competence, and 3) arguing that localization is a more useful framing for the goals of much current work in cultural NLP.
ER-Reason: A Benchmark Dataset for LLM Clinical Reasoning in the Emergency Room
ArXiv.org · 2025-05-28
preprintOpen accessExisting benchmarks for evaluating the clinical reasoning capabilities of large language models (LLMs) often lack a clear definition of "clinical reasoning" as a construct, fail to capture the full breadth of interdependent tasks within a clinical workflow, and rely on stylized vignettes rather than real-world clinical documentation. As a result, recent studies have found significant discrepancies between LLM performance on stylized benchmarks derived from medical licensing exams and their performance in real-world prospective studies. To address these limitations, we introduce ER-Reason, a benchmark designed to evaluate LLM reasoning as clinical evidence accumulates across decision-making tasks spanning the full workflow of emergency medicine. ER-Reason comprises 25,174 de-identified clinical notes from 3,437 patients, supporting evaluation across all stages of the emergency department workflow: triage intake, treatment selection, disposition planning, and final diagnosis. Crucially, evaluation in ER-Reason extends beyond diagnostic accuracy to include stepwise Script Concordance Test (SCT)-style questions grounded in real patient cases, which assess whether LLMs update their diagnostic beliefs in the correct direction and magnitude as clinical evidence accumulates, scored against 2,555 emergency physician annotations. We evaluate reasoning and non-reasoning LLMs on ER-Reason, and show that our tasks provide a more nuanced view of how LLM reasoning fails on real patient cases than existing benchmarks allow.
Recent grants
III: Small: Collaborative Research: Building Subjective Knowledge Bases by Modeling Viewpoints
NSF · $266k · 2018–2022
CAREER: Using Fiction to Improve Real-World Information Systems
NSF · $466k · 2020–2025
Frequent coauthors
- 22 shared
Gregory Crane
- 17 shared
Jon Gillick
University of California, Berkeley
- 12 shared
Noah A. Smith
- 11 shared
Li Lucy
- 9 shared
Kent K. Chang
- 9 shared
Sandeep Soni
- 7 shared
Matthew Sims
- 6 shared
Alison Babeu
Labs
Berkeley AI Research Lab (BAIR)PI
Natural language processing and cultural analytics, applying NLP and AI to empirical questions in the humanities and social sciences.
Education
- 1990
Ph.D., Computer Science
University of California, Berkeley
- 1986
M.S., Computer Science
University of California, Berkeley
- 1983
B.S., Computer Science
University of California, Berkeley
Awards & honors
- Hellman Fellow (2019)
- Amazon Research Award (2017)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with David Bamman
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup