Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

David Bamman

· Below The Line Associate ProfessorVerified

University of California, Berkeley · Department of Electrical Engineering and Computer Sciences

Active 2006–2025

h-index31
Citations3.9k
Papers10640 last 5y
Funding$732k
See your match with David Bamman — sign in to PhdFit.Sign in

About

David Bamman is an associate professor in the School of Information at the University of California, Berkeley. His work focuses on natural language processing (NLP) and cultural analytics, applying NLP and artificial intelligence (AI) to empirical questions in the humanities and social sciences. Bamman's research aims to improve computational methods for underserved domains such as literature, including projects like LitBank and BookNLP, and to develop new empirical approaches for studying literature, film, and culture. Prior to his appointment at Berkeley, he earned his PhD from the School of Computer Science at Carnegie Mellon University and worked as a senior researcher at the Perseus Project at Tufts University. His research has received support from prominent organizations including the National Endowment for the Humanities, the National Science Foundation, the Mellon Foundation, and he is a recipient of an NSF CAREER award.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Natural Language Processing
  • Psychology
  • Epistemology
  • Linguistics
  • Social psychology
  • History
  • Data science
  • Mathematics
  • Cognitive science
  • Philosophy
  • Communication

Selected publications

  • Tell, Don’t Show: Leveraging Language Models’ Abstractive Retellings to Model Literary Themes

    2025-01-01 · 1 citations

    articleOpen accessSenior author

    Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text.Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to show, don't tell.We propose Retell, a simple, accessible topic modeling approach for literature.Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higherlevel concepts and themes.By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics.To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.

  • Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes

    ArXiv.org · 2025-05-29

    preprintOpen accessSenior author

    Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to "show, don't tell." We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.

  • Measuring the Stories in Contemporary Songs

    2025-08-21

    articleOpen access1st authorCorresponding

    Lyric poetry--the poetry of song--is often defined in opposition to narrative. In this work, we examine this relationship by carrying out an empirical study to measure the degree of narrativity present in contemporary songs, using a dataset of popular (Billboard Hot 100) and prestigious (Grammy-nominated) songs spanning 1960-2024. While we might expect the 1960s (with ballad-driven folk singers like Joan Baez, Bob Dylan and Simon & Garfunkel) to be a high-water mark for narrativity, we find the opposite: narrativity has been steadily increasing over this period, largely due to the rise of the strongly narrative genres of hip hop and rap. We also find that it is a marker of prestige for country music, with Grammy-award nominated "Best Country" songs displaying significantly higher narrativity rates than non-nominated songs from the same album.

  • Culture is Not Trivia: Sociocultural Theory for Cultural NLP

    2025-01-01 · 3 citations

    articleOpen access

    The field of cultural NLP has recently experienced rapid growth, driven by a pressing need to ensure that language technologies are effective and safe across a pluralistic user base.This work has largely progressed without a shared conception of culture, instead choosing to rely on a wide array of cultural proxies.However, this leads to a number of recurring limitations: coarse national boundaries fail to capture nuanced differences that lay within them, limited coverage restricts datasets to only a subset of usually highly-represented cultures, and a lack of dynamicity results in static cultural benchmarks that do not change as culture evolves.In this position paper, we argue that these methodological limitations are symptomatic of a theoretical gap.We draw on a well-developed theory of culture from sociocultural linguistics to fill this gap by 1) demonstrating in a case study how it can clarify methodological constraints and affordances, 2) offering theoretically-motivated paths forward to achieving cultural competence, and 3) arguing that localization is a more useful framing for the goals of much current work in cultural NLP.

  • Multimodal Conversation Structure Understanding

    ArXiv.org · 2025-05-23

    preprintOpen accessSenior author

    While multimodal large language models (LLMs) excel at dialogue, whether they can adequately parse the structure of conversation -- conversational roles and threading -- remains underexplored. In this work, we introduce a suite of tasks and release TV-MMPC, a new annotated dataset, for multimodal conversation structure understanding. Our evaluation reveals that while all multimodal LLMs outperform our heuristic baseline, even the best-performing model we consider experiences a substantial drop in performance when character identities of the conversation are anonymized. Beyond evaluation, we carry out a sociolinguistic analysis of 350,842 utterances in TVQA. We find that while female characters initiate conversations at rates in proportion to their speaking time, they are 1.2 times more likely than men to be cast as an addressee or side-participant, and the presence of side-participants shifts the conversational register from personal to social.

  • Measuring the Stories in Contemporary Songs

    2025-11-17

    book-chapterOpen access1st authorCorresponding

    Lyric poetry—the poetry of song—is often defined in opposition to narrative. In this work, we examine this relationship by carrying out an empirical study to measure the degree of <em>narrativity</em> present in contemporary songs, using a dataset of popular (Billboard Hot 100) and prestigious (Grammy-nominated) songs spanning 1960–2024. While we might expect the 1960s (with ballad-driven folk singers like Joan Baez, Bob Dylan and Simon &amp; Garfunkel) to be a high-water mark for narrativity, we find the opposite: narrativity has been steadily increasing over this period, largely due to the rise of the strongly narrative genres of hip hop and rap. We also find that it is a marker of prestige for country music, with Grammy-award nominated “Best Country” songs displaying significantly higher narrativity rates than non-nominated songs from the same album.

  • “Othering” Through War: Depiction of Asians/Asian Americans in U.S. History Textbooks

    Educational Researcher · 2025-03-27 · 2 citations

    articleOpen accessSenior author

    Using computational methods, we investigate a data set of 874,125 sentences from 30 U.S. history textbooks used in California and Texas schools to consider how they discuss Asians/Asian Americans. Only 1% of all sentences in our sample has any mention of Asians. Most of these sentences focus on Chinese and Japanese, and when individuals are named, they are usually White. The most prevalent topics in which Asians appear are about war. Discussions of wars are a centerpiece of history textbooks, but the dominance of such narratives is especially high for Asians relative to other ethno-cultural groups. The sentiment of verbs used to describe Asians is strikingly negative. Asians are described more negatively than others in both war and nonwar contexts.

  • Measuring the Stories in Contemporary Songs

    2025-11-04

    article1st authorCorresponding

    Lyric poetry--the poetry of song--is often defined in opposition to narrative. In this work, we examine this relationship by carrying out an empirical study to measure the degree of narrativity present in contemporary songs, using a dataset of popular (Billboard Hot 100) and prestigious (Grammy-nominated) songs spanning 1960-2024. While we might expect the 1960s (with ballad-driven folk singers like Joan Baez, Bob Dylan and Simon &amp;amp; Garfunkel) to be a high-water mark for narrativity, we find the opposite: narrativity has been steadily increasing over this period, largely due to the rise of the strongly narrative genres of hip hop and rap. We also find that it is a marker of prestige for country music, with Grammy-award nominated "Best Country" songs displaying significantly higher narrativity rates than non-nominated songs from the same album.

  • Culture is Not Trivia: Sociocultural Theory for Cultural NLP

    arXiv (Cornell University) · 2025-02-17

    preprintOpen access

    The field of cultural NLP has recently experienced rapid growth, driven by a pressing need to ensure that language technologies are effective and safe across a pluralistic user base. This work has largely progressed without a shared conception of culture, instead choosing to rely on a wide array of cultural proxies. However, this leads to a number of recurring limitations: coarse national boundaries fail to capture nuanced differences that lay within them, limited coverage restricts datasets to only a subset of usually highly-represented cultures, and a lack of dynamicity results in static cultural benchmarks that do not change as culture evolves. In this position paper, we argue that these methodological limitations are symptomatic of a theoretical gap. We draw on a well-developed theory of culture from sociocultural linguistics to fill this gap by 1) demonstrating in a case study how it can clarify methodological constraints and affordances, 2) offering theoretically-motivated paths forward to achieving cultural competence, and 3) arguing that localization is a more useful framing for the goals of much current work in cultural NLP.

  • ER-Reason: A Benchmark Dataset for LLM Clinical Reasoning in the Emergency Room

    ArXiv.org · 2025-05-28

    preprintOpen access

    Existing benchmarks for evaluating the clinical reasoning capabilities of large language models (LLMs) often lack a clear definition of "clinical reasoning" as a construct, fail to capture the full breadth of interdependent tasks within a clinical workflow, and rely on stylized vignettes rather than real-world clinical documentation. As a result, recent studies have found significant discrepancies between LLM performance on stylized benchmarks derived from medical licensing exams and their performance in real-world prospective studies. To address these limitations, we introduce ER-Reason, a benchmark designed to evaluate LLM reasoning as clinical evidence accumulates across decision-making tasks spanning the full workflow of emergency medicine. ER-Reason comprises 25,174 de-identified clinical notes from 3,437 patients, supporting evaluation across all stages of the emergency department workflow: triage intake, treatment selection, disposition planning, and final diagnosis. Crucially, evaluation in ER-Reason extends beyond diagnostic accuracy to include stepwise Script Concordance Test (SCT)-style questions grounded in real patient cases, which assess whether LLMs update their diagnostic beliefs in the correct direction and magnitude as clinical evidence accumulates, scored against 2,555 emergency physician annotations. We evaluate reasoning and non-reasoning LLMs on ER-Reason, and show that our tasks provide a more nuanced view of how LLM reasoning fails on real patient cases than existing benchmarks allow.

Recent grants

Frequent coauthors

  • Gregory Crane

    22 shared
  • Jon Gillick

    University of California, Berkeley

    17 shared
  • Noah A. Smith

    12 shared
  • Li Lucy

    11 shared
  • Kent K. Chang

    9 shared
  • Sandeep Soni

    9 shared
  • Matthew Sims

    7 shared
  • Alison Babeu

    6 shared

Labs

  • Berkeley AI Research Lab (BAIR)PI

    Natural language processing and cultural analytics, applying NLP and AI to empirical questions in the humanities and social sciences.

Education

  • Ph.D., Computer Science

    University of California, Berkeley

    1990
  • M.S., Computer Science

    University of California, Berkeley

    1986
  • B.S., Computer Science

    University of California, Berkeley

    1983

Awards & honors

  • Hellman Fellow (2019)
  • Amazon Research Award (2017)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with David Bamman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup