Dani Byrd

· Professor of LinguisticsVerified

University of Southern California · Linguistics

Active 1990–2026

h-index39

Citations6.0k

Papers16017 last 5y

Funding$6.5M

Faculty page Lab page Website

See your match with Dani Byrd — sign in to PhdFit.Sign in

About

Dani Byrd is a Professor in the Department of Linguistics at the University of Southern California, within the Dornsife College of Letters, Arts and Sciences. Her research focuses on phonetics and phonology, and she is actively involved in research groups such as the USC Phonetics & Phonology Group and the USC SPAN Research Group. She contributes to the academic community through her work on speech, words, and the mind, and she has authored an introductory textbook titled 'Discovering Speech, Words, and Mind.' Professor Byrd is engaged in teaching courses related to phonetics and phonology and provides resources and support for students and colleagues in her field.

Research topics

Artificial Intelligence
Computer Science
Speech recognition
Physics
Linguistics
Computer vision
Mathematics

Selected publications

Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
Open MIND · 2026-01-20
preprint
Many spoken languages, including English, exhibit wide variation in dialects and accents, making accent control an important capability for flexible text-to-speech (TTS) models. Current TTS systems typically generate accented speech by conditioning on speaker embeddings associated with specific accents. While effective, this approach offers limited interpretability and controllability, as embeddings also encode traits such as timbre and emotion. In this study, we analyze the interaction between speaker embeddings and linguistically motivated phonological rules in accented speech synthesis. Using American and British English as a case study, we implement rules for flapping, rhoticity, and vowel correspondences. We propose the phoneme shift rate (PSR), a novel metric quantifying how strongly embeddings preserve or override rule-based transformations. Experiments show that combining rules with embeddings yields more authentic accents, while embeddings can attenuate or overwrite rules, revealing entanglement between accent and speaker identity. Our findings highlight rules as a lever for accent control and a framework for evaluating disentanglement in speech generation.
DOI
An Approach to Simultaneous Acquisition of Real-Time MRI Video, EEG, and Surface EMG for Articulatory, Brain, and Muscle Activity During Speech Production
Open MIND · 2026-03-05
preprint
Speech production is a complex process spanning neural planning, motor control, muscle activation, and articulatory kinematics. While the acoustic speech signal is the most accessible product of the speech production act, it does not directly reveal its causal neurophysiological substrates. We present the first simultaneous acquisition of real-time (dynamic) MRI, EEG, and surface EMG, capturing several key aspects of the speech production chain: brain signals, muscle activations, and articulatory movements. This multimodal acquisition paradigm presents substantial technical challenges, including MRI-induced electromagnetic interference and myogenic artifacts. To mitigate these, we introduce an artifact suppression pipeline tailored to this tri-modal setting. Once fully developed, this framework is poised to offer an unprecedented window into speech neuroscience and insights leading to brain-computer interface advances.
DOI
Learning-free L2-Accented Speech Generation using Phonological Rules
ArXiv.org · 2026-03-08
articleOpen access
Accent plays a crucial role in speaker identity and inclusivity in speech technologies. Existing accented text-to-speech (TTS) systems either require large-scale accented datasets or lack fine-grained phoneme-level controllability. We propose a accented TTS framework that combines phonological rules with a multilingual TTS model. The rules are applied to phoneme sequences to transform accent at the phoneme level while preserving intelligibility. The method requires no accented training data and enables explicit phoneme-level accent manipulation. We design rule sets for Spanish- and Indian-accented English, modeling systematic differences in consonants, vowels, and syllable structure arising from phonotactic constraints. We analyze the trade-off between phoneme-level duration alignment and accent as realized in speech timing. Experimental results demonstrate effective accent shift while maintaining speech quality.
Publisher OA PDF
Deep learning characterizes depression and suicidal ideation in young adults from eye movements
npj Digital Medicine · 2026-03-28 · 1 citations
articleOpen access
Objective biobehavioral markers for mental health conditions remain elusive, with diagnosis typically relying on self-reports and clinical interviews. We investigate eye tracking as a potential marker of attentional and mood biases associated with symptoms of depression and suicidal ideation from self-reported screening questionnaires. We analyze eye movements from 126 young adults during reading and responding to emotionally loaded sentences. A deep learning framework was designed to account for intra-trial and inter-trial variations in eye movements, achieving an AUC of 0.793 (95% CI: 0.766-0.819) for identifying depression/suicidality against healthy controls, and 0.826 (95% CI: 0.798-0.853) for suicidality specifically. The model also exhibited moderate accuracy in differentiating depressed from suicidal individuals (AUC: 0.609, 95% CI: 0.569-0.646). Discriminative patterns were more pronounced during response generation and for stimuli of negative sentiment. These findings suggest that eye tracking can provide objective markers of self-reported symptom severity by measuring the impact of emotional stimuli on oculomotor control.
Publisher OA PDF DOI
Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
2026-04-21
article
Many spoken languages, including English, exhibit wide variation in dialects and accents, making accent control an important capability for flexible text-to-speech (TTS) models. Current TTS systems typically generate accented speech by conditioning on speaker embeddings associated with specific accents. While effective, this approach offers limited interpretability and controllability, as embeddings also encode traits such as timbre and emotion. In this study, we analyze the interaction between speaker embeddings and linguistically motivated phonological rules in accented speech synthesis. Using American and British English as a case study, we implement rules for flapping, rhoticity, and vowel correspondences. We propose the phoneme shift rate (PSR), a novel metric quantifying how strongly embeddings preserve or override rule-based transformations. Experiments show that combining rules with embeddings yields more authentic accents, while embeddings can attenuate or overwrite rules, revealing entanglement between accent and speaker identity. Our findings highlight rules as a lever for accent control and a framework for evaluating disentanglement in speech generation.
Publisher DOI
Articulatory kinematics of penultimate and final lengthening in Setswana: Evidence from real-time MRI
2026-05-14
articleOpen access
The current real-time vocal tract MRI study examines the articulatory encoding of prosodic boundary, prominence and their interaction through kinematic analysis of penultimate and final lengthening near an intonational phrase (IP) boundary in Setswana.One hypothesis is that penultimate lengthening represents a specific case of final lengthening initiated on the IP-penultimate position.Alternatively, penultimate lengthening and final lengthening may result from the interaction between phrase-level prominence and boundary events.Our results reveal two phases of lengthening in the IP-penultimate and IPfinal positions.Displacement and peak velocity are also greater IP-finally than IP-medially, but boundary-related increase in displacement and peak velocity only shows a single progressive trend approaching the final IP boundary, with no IP-penultimate alterations comparable to durational patterns.Additionally, there is some evidence for greater duration, displacement and peak velocity of initial consonant gestures on word-penultimate syllables than on word-final ones regardless of utterance positions, indicating a possible word-penultimate prominence effect.These findings suggest that penultimate and final lengthening in Setswana are better understood as the interaction between disparate prominence and boundary events.The results are interpreted according to a prosodic gestural approach that posits the coordination of a phrasal-prominence-encoding gesture and a boundary-encoding gesture.
Publisher OA PDF DOI
Learning-free L2-Accented Speech Generation using Phonological Rules
Open MIND · 2026-03-08
preprint
Accent plays a crucial role in speaker identity and inclusivity in speech technologies. Existing accented text-to-speech (TTS) systems either require large-scale accented datasets or lack fine-grained phoneme-level controllability. We propose a accented TTS framework that combines phonological rules with a multilingual TTS model. The rules are applied to phoneme sequences to transform accent at the phoneme level while preserving intelligibility. The method requires no accented training data and enables explicit phoneme-level accent manipulation. We design rule sets for Spanish- and Indian-accented English, modeling systematic differences in consonants, vowels, and syllable structure arising from phonotactic constraints. We analyze the trade-off between phoneme-level duration alignment and accent as realized in speech timing. Experimental results demonstrate effective accent shift while maintaining speech quality.
DOI
Automation of real-time vocal tract image segmentation with SAM 2.0 and morphological operation implementation
JASA Express Letters · 2026-03-01
articleOpen access
Modeling articulatory representations is critical to the scientific study of speech production, including its relation to speech acoustics. However, discretizing articulatory dynamics in continuous speech has proven computationally taxing. For example, segmentation analyses of real-time vocal tract images deploying contour-tracking methods, while successful, require manual creation of templates and human supervised assessment [e.g., Bresch and Narayanan (2009). IEEE Trans. Med. Imaging. 28(3), 323-338]. In this paper, we utilize Segment Anything Model 2 (SAM 2.0) [Ravi et al. (2024). arXiv:2408.00714] to efficiently segment critical articulators in real-time magnetic resonance imaging speech production data without fine-tuning and with global nonlinear image filtering to examine such systems' ability to segment speech dynamics, which have both language- and subject-specific characteristics.
Publisher DOI
Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis
ArXiv.org · 2026-01-20
articleOpen access
Many spoken languages, including English, exhibit wide variation in dialects and accents, making accent control an important capability for flexible text-to-speech (TTS) models. Current TTS systems typically generate accented speech by conditioning on speaker embeddings associated with specific accents. While effective, this approach offers limited interpretability and controllability, as embeddings also encode traits such as timbre and emotion. In this study, we analyze the interaction between speaker embeddings and linguistically motivated phonological rules in accented speech synthesis. Using American and British English as a case study, we implement rules for flapping, rhoticity, and vowel correspondences. We propose the phoneme shift rate (PSR), a novel metric quantifying how strongly embeddings preserve or override rule-based transformations. Experiments show that combining rules with embeddings yields more authentic accents, while embeddings can attenuate or overwrite rules, revealing entanglement between accent and speaker identity. Our findings highlight rules as a lever for accent control and a framework for evaluating disentanglement in speech generation.
Publisher OA PDF
Interpretable Modeling of Articulatory Temporal Dynamics from Real-Time MRI for Phoneme Recognition
2026-04-21
articleOpen access
Real-time Magnetic Resonance Imaging (rtMRI) visualizes vocal tract action, offering a comprehensive window into speech articulation. However, its signals are high dimensional and noisy, hindering interpretation. We investigate compact representations of spatiotemporal articulatory dynamics for phoneme recognition from midsagittal vocal tract rtMRI videos. We compare three feature types: (1) raw video, (2) optical flow, and (3) six linguistically-relevant regions of interest (ROIs) for articulator movements. We evaluate models trained independently on each representation, as well as multi-feature combinations. Results show that multi-feature models consistently outperform single-feature baselines, with the lowest phoneme error rate (PER) of 0.34 obtained by combining ROI and raw video. Temporal fidelity experiments demonstrate a reliance on fine-grained articulatory dynamics, while ROI ablation studies reveal strong contributions from tongue and lips. Our findings highlight how rtMRI-derived features provide accuracy and interpretability, and establish strategies for leveraging articulatory data in speech processing. The source code is publicly available. <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup>
Publisher OA PDF DOI

Recent grants

Speech Prosody and Articulatory Dynamics in Spoken Language
NIH · $5.9M · 1997–2019
Doctoral Dissertation Research: Articulatory Dynamics and Stability of Multi-gesture Complexes
NSF · $19k · 2021–2023
NIH Grant R29DC003172
NIH · $553k · 2002

Frequent coauthors

Louis Goldstein
77 shared
Shrikanth Narayanan
60 shared
Sungbok Lee
Korea Advanced Institute of Science and Technology
41 shared
Krishna S. Nayak
21 shared
Asterios Toutios
University of Southern California
21 shared
Elliot Saltzman
Boston University
19 shared
Erik Bresch
Philips (Netherlands)
17 shared
Benjamin Parrell
University of Wisconsin–Madison
15 shared

Education

Ph.D., Linguistics
University of Southern California
M.A., Linguistics
University of Southern California
B.A., Linguistics
University of Southern California

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Dani Byrd

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you