Khalil Iskarous

· Professor of LinguisticsVerified

University of Southern California · Linguistics

Active 2001–2026

h-index19

Citations1.3k

Papers9818 last 5y

Funding$1.6M1 active

Faculty page

See your match with Khalil Iskarous — sign in to PhdFit.Sign in

About

Khalil Iskarous is a Professor of Linguistics at USC Dornsife with research interests centered on the computational nature of cognition and motor control. He explores the combinatorial computational processes that organize lower-level units into higher units in both linguistic cognition and animal motor control. His work involves understanding how sounds combine into words and discourse, and how muscular contractions organize into complex, purposeful movements in animals. Iskarous employs tools such as dynamical systems analysis and modern AI methods, including Neural Turing Machines, Hopfield Neural Networks, and Attention-based Neural Networks, to investigate these processes. His research has covered phonological cognition, speech production and perception, language change, atypical speech, and the inner workings of AI systems. In motor control, he has studied human tongue dynamics using MRI, octopus arm movements, and the behavior of C. elegans worms. He hypothesizes that the computational similarities between cognition and motor control can provide insights into each domain through their similarities and differences. Supported by multiple NSF grants, his work aims to develop a deeper understanding of the underlying computational principles of animal movement and speech perception.

Research topics

Computer Science
Natural Language Processing
Linguistics
Artificial Intelligence
Engineering
Biology
Speech recognition
Fishery
Medicine
Combinatorics
Programming language
Mathematics

Selected publications

Holistic distributional signatures of F0 dynamics
2026-05-14
articleOpen access1st authorCorresponding
The analysis of the F0 aspects of linguistic patterning in different languages usually consists of identifying various factors involving F0, such as obstruent effects on F0 in vowels, or a variety of prosodic patterns involving F0.In this paper we advocate for a complementary holistic approach in which we take many F0 trajectories in a language, calculate the difference in F0 from one frame to another, and then computing a probability distribution of F0 changes.This distribution is then a holistic representative of the language's F0 use.Using 60minute portions of speech from a diverse set of 20 languages, we show that the probability distribution of F0 change acts as a signature of that language's use of F0, with well-differentiated modes distinctive of the language.We also propose Optimal Transport-based methods for comparing the probability distributions of F0 in different languages.In addition, we show that even though the probability distributions are empirically based, they can also be regarded as Boltzmann distributions, and therefore dynamical system theories, quite related to previous dynamical theories of speech.This holistic description of each language and its difference from others could lead to better quantification of prosodic typologies, as well as typologies of F0 perturbations due to obstruents.In addition, since AI learning is usually regarded as parameterized probability distribution estimation, this approach could make the similarities and differences between the human and AI speech skills clearer.
Publisher OA PDF DOI
Automation of real-time vocal tract image segmentation with SAM 2.0 and morphological operation implementation
JASA Express Letters · 2026-03-01
articleOpen accessSenior author
Modeling articulatory representations is critical to the scientific study of speech production, including its relation to speech acoustics. However, discretizing articulatory dynamics in continuous speech has proven computationally taxing. For example, segmentation analyses of real-time vocal tract images deploying contour-tracking methods, while successful, require manual creation of templates and human supervised assessment [e.g., Bresch and Narayanan (2009). IEEE Trans. Med. Imaging. 28(3), 323-338]. In this paper, we utilize Segment Anything Model 2 (SAM 2.0) [Ravi et al. (2024). arXiv:2408.00714] to efficiently segment critical articulators in real-time magnetic resonance imaging speech production data without fine-tuning and with global nonlinear image filtering to examine such systems' ability to segment speech dynamics, which have both language- and subject-specific characteristics.
Publisher DOI
Comparing LLM-Based Translation Approaches for Extremely Low-Resource Languages
2026-01-01
articleOpen accessSenior author
Jared Coleman, Ruben Rosales, Kira Toal, Diego Cuadros, Nicholas Leeds, Bhaskar Krishnamachari, Khalil Iskarous. Proceedings for the Ninth Workshop on Technologies for Machine Translation of Low Resource Languages (LoResMT 2026). 2026.
Publisher OA PDF DOI
Comparing LLM-Based Translation Approaches for Extremely Low-Resource Languages
Underline Science Inc. · 2026-03-14
otherOpen access
We present a comprehensive evaluation and extension of the LLM-Assisted Rule-Based Machine Translation (LLM-RBMT) paradigm, an approach that combines the strengths of rule-based methods and Large Language Models (LLMs) to support translation in no-resource settings. We present a robust new implementation (the Pipeline Translator) that generalizes the LLM-RBMT approach and enables flexible adaptation to novel constructions. We benchmark it against four alternatives (Builder, Instructions, RAG, and Fine-tuned translators) on a curated dataset of 150 English sentences, and compare them across translation quality and runtime. The Pipeline Translator consistently achieves the best overall performance. The LLM-RBMT methods (Pipeline and Builder) also offer an important advantage: they naturally align with evaluation strategies that prioritize grammaticality and semantic fidelity over surface-form overlap, which is critical for endangered languages where mistranslation carries high risk.
Publisher DOI
Instantaneous changes in acoustic signals reflect syllable progression and cross-linguistic syllable variation
2025-08-17
article
Publisher DOI
Articulatory stability of consonants in syllable nucleus versus syllable margin
The Journal of the Acoustical Society of America · 2025-10-01
article
Phonological structure shapes the relative timing and coordination of the articulatory movements within a syllable. While intergestural timing of consonants, both in clusters and relative to their vocalic syllable nucleus, is well established for English (e.g., Byrd 1996, Browman and Goldstein 2000, Nam et al. 2009, Byrd & Choi 2010), the kinematic properties of consonantal nuclei remain less understood. One study of Slovak investigated the articulatory coordination for syllabic liquids (Pouplier & Beňuš 2011), finding no distinctions in articulatory measurements of the liquid consonant when occupying nucleus versus onset/coda position. To further probe intrasyllable spatiotemporal organization of syllabic consonants, we conduct a study in English that manipulates the structural position of a consonant (e.g., nucleus versus onset [n]) while maintaining the segmental ordering with a preceding consonant (e.g., deepen versus deep not). We test whether the spatiotemporal stability of the consonantal constriction target differs as a function of structural position. We utilize the AI model SAM 2.0 (Ravi et al. 2024) to perform automated image segmentation on vocal tract rtMRI data of read speech. Further illumination of the kinematics of syllabic consonants in speech can serve to complement previous models that have specifically addressed consonant-vowel sequencing. [Work supported by NSF.]
Publisher DOI
Towards a dynamical model of transitions between fluent and stuttered speech
2025-08-17 · 1 citations
article
Publisher DOI
Articulatory Phonology
Elsevier eBooks · 2025-01-01
book-chapter1st authorCorresponding
Publisher DOI
Speech transformer models demonstrate a sensitivity to articulatory events
The Journal of the Acoustical Society of America · 2025-10-01
article1st authorCorresponding
The black box of speech recognition LLMs has yet to give up its secret of how they are able to extract linguistically salient information from continuous audio. One hypothesis is that these systems use low-level correlations in the audio signal; while a complementary view hypothesizes that these systems capture causal information that structures the signal in a way relevant to speech production. This quandary is addressed here by deploying the HuBERT LLM on audio for which we have simultaneous real-time vocal tract MRI. We probe correlations of attentional dynamics that incorporate acoustic measures and those that incorporate articulatory change as indexed by proxy MFCCs. Additionally, we extract airway edges from vocal tract rtMRI video of read speech. HuBERT-Large with 24 encoders and 16 attention heads in each encoder was used. We demonstrate that while the model’s attentional mechanisms can and do focus on acoustic, spectral, and articulatory events, we can provide analysis based on the acoustic theory of speech production that these results are best explained by assuming that these systems are aware of the causal status of the articulatory events that generate the speech signal. [Work supported by NSF.]
Publisher DOI
Phase-locking of oscillatory acoustic signals reflects syllable progression and variation
The Journal of the Acoustical Society of America · 2024-10-01
article
While a common linguistic view of the cognitive representation of speech is that it is composed of sequenced syllable units, how exactly syllables as abstract cognitive compositional structure relate to quantifiable patterns in the observable signals of articulation and acoustics remains opaque. Previous work has suggested that oscillatory acoustic properties can serve to link linguistic representations and physical events (Tilsen & Arvaniti 2013). We further probe this relationship by testing the temporal coordination between oscillatory signal measures—changes in spectral energy and in amplitude—and syllable boundary locations through the use of phase-locking analyses (Lancia et al. 2023). Results in both English and Tashlhiyt for vocalic and consonantal syllabic nuclei show significant phase-locking values (PLVs) and demonstrate that these signal measures track syllable progression across typologically different languages. Furthermore, the cross-language preferences in syllable nucleus types are reflected in their respective PLVs. Specifically, vocalic nuclei exhibit the highest PLVs, followed by wide aperture consonantal nuclei (sonorants), and lastly by consonantal nuclei with narrow-to-closed constrictions (obstruents). Overall, the findings demonstrate a tight coordination between abstract syllable units and quantifiable signal properties and additionally provide novel dynamical grounding for cross-linguistic nucleus preferences.
Publisher DOI

Recent grants

INSPIRE: Dynamical Principles of Animal Movement
NSF · $994k · 2012–2017
CompCog: Deep causal inference grounds the perception of cognitive objects in speech
NSF · $600k · 2023–2026

Frequent coauthors

D. H. Whalen
City University of New York
25 shared
Louis Goldstein
19 shared
Michael Proctor
18 shared
Aude Noiray
Centre National de la Recherche Scientifique
17 shared
Christine H. Shadle
Yale University
11 shared
Jennifer A. Mather
8 shared
Shrikanth Narayanan
8 shared
Jean Alupay
ME Association
8 shared

Labs

Khalil Iskarous LaboratoryPI

Awards & honors

CompCog: Deep Causal Inference Grounds The Perception of Cog…
Dynamical Principles of Animal Movement (National Science Fo…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Khalil Iskarous

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you