Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Gopala Krishna Anumanchipalli

· Assistant ProfessorVerified

University of California, Berkeley · Department of Electrical Engineering and Computer Sciences

Active 2007–2025

h-index19
Citations2.9k
Papers9460 last 5y
Funding
See your match with Gopala Krishna Anumanchipalli — sign in to PhdFit.Sign in

About

Gopala Krishna Anumanchipalli is the Robert E. and Beverly A. Brooks Assistant Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, and also holds a position in the Department of Neurosurgery at UC San Francisco. He leads the Berkeley Speech Group, focusing on the intersection of speech processing, neuroscience, and artificial intelligence. His research emphasizes human-centered speech and assistive technologies, including the development of bio-inspired spoken language technologies, automated methods for early diagnosis, and approaches to characterize and rehabilitate disordered speech. His broader interests include healthcare, conversational AI, and multimodal learning. The Berkeley Speech Group is part of the Berkeley AI Research (BAIR) community. Gopala Anumanchipalli completed his PhD at Carnegie Mellon University and Instituto Superior Tecnico under the advisement of Alan Black and Luis Oliveira, followed by a postdoctoral fellowship at UC San Francisco with Edward Chang. He earned his B.Tech and M.S. degrees from IIIT Hyderabad under Raj Reddy.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Neuroscience
  • Psychology
  • Speech recognition
  • Telecommunications
  • Human–computer interaction
  • Linguistics
  • Audiology
  • Medicine
  • Physical medicine and rehabilitation

Selected publications

  • Automatic Detection of Articulatory-Based Disfluencies in Primary Progressive Aphasia

    IEEE Journal of Selected Topics in Signal Processing · 2025-06-16 · 4 citations

    articleOpen accessSenior author

    Speech corpora are collections of textual data derived from human verbal output and speech signals that can be processed from a variety of perspectives, including formal or semantic content, to serve analyses of different levels of linguistic organisation (phonemic, morphosyntactic, lexico-semantic and content information, prosody and intonation) and to serve analyses of important phenomena such as speech fluency and errors (non-fluencies). We focus on transcribing speech along with non-fluencies or dysfluencies, the detection of which plays an important role in the diagnosis of primary progressive aphasia, where we specifically examine articulation-based dysfluencies in nfvPPA speech. In this work, we propose SSDM 2.0, which is built on top of the current state-of-the-art system of dysfluency detection [1] and tackles its shortcomings via four main contributions: (1) We propose a novel Neural Articulatory Flow for deriving highly scalable, dysfluency-aware speech representations. (2) We develop a full-stack connectionist subsequence aligner to capture all major dysfluency types. (3) We introduce a mispronunciation prompt pipeline and consistency learning into LLMs to enable in-context dysfluency learning. (4) We curate and open-source Libri-Co-Dys [1], the largest co-dysfluency corpus to date. (5) We also present SSDM-L, a modular, non-end-to-end, lightweight model designed for clinical deployment. In clinical experiments on pathological speech transcription, we tested SSDM 2.0 using nfvPPA corpus primarily characterized by articulatory dysfluencies. Overall, SSDM 2.0 outper-forms SSDM and all other dysfluency transcription models by a large margin. See our project demo page at https:// berkeley-speech-group.github.io/SSDM2.0/.

  • Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

    ArXiv.org · 2025-03-12 · 1 citations

    preprintOpen access

    Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.

  • Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

    ArXiv.org · 2025-03-06

    preprintOpen access

    Spoken dialogue modeling poses challenges beyond text-based language modeling, requiring real-time interaction, turn-taking, and backchanneling. While most Spoken Dialogue Models (SDMs) operate in half-duplex mode-processing one turn at a time - emerging full-duplex SDMs can listen and speak simultaneously, enabling more natural conversations. However, current evaluations remain limited, focusing mainly on turn-based metrics or coarse corpus-level analyses. To address this, we introduce Full-Duplex-Bench, a benchmark that systematically evaluates key interactive behaviors: pause handling, backchanneling, turn-taking, and interruption management. Our framework uses automatic metrics for consistent, reproducible assessment and provides a fair, fast evaluation setup. By releasing our benchmark and code, we aim to advance spoken dialogue modeling and foster the development of more natural and engaging SDMs.

  • A High-Performance Neuroprosthesis for Speech Decoding and Avatar Control

    Springer briefs in electrical and computer engineering · 2025-01-01

    book-chapter
  • A streaming brain-to-voice neuroprosthesis to restore naturalistic communication

    Nature Neuroscience · 2025-03-31 · 56 citations

    articleSenior author
  • TART: A Comprehensive Tool for Technique-Aware Audio-to-Tab Guitar Transcription

    ArXiv.org · 2025-10-02

    preprintOpen accessSenior author

    Automatic Music Transcription (AMT) has advanced significantly for the piano, but transcription for the guitar remains limited due to several key challenges. Existing systems fail to detect and annotate expressive techniques (e.g., slides, bends, percussive hits) and incorrectly map notes to the wrong string and fret combination in the generated tablature. Furthermore, prior models are typically trained on small, isolated datasets, limiting their generalizability to real-world guitar recordings. To overcome these limitations, we propose a four-stage end-to-end pipeline that produces detailed guitar tablature directly from audio. Our system consists of (1) Audio-to-MIDI pitch conversion through a piano transcription model adapted to guitar datasets; (2) MLP-based expressive technique classification; (3) Transformer-based string and fret assignment; and (4) LSTM-based tablature generation. To the best of our knowledge, this framework is the first to generate detailed tablature with accurate fingerings and expressive labels from guitar audio.

  • SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in Hubert

    2024-03-18 · 4 citations

    articleSenior author

    Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and speech units beyond phonemes are largely underexplored. Here, we demonstrate that a syllabic organization emerges in learning sentence-level representation of speech. In particular, we adopt "self-distillation" objective to fine-tune the pretrained HuBERT with an aggregator token that summarizes the entire sentence. Without any supervision, the resulting model draws definite boundaries in speech, and the representations across frames exhibit salient syllabic structures. We demonstrate that this emergent structure largely corresponds to the ground truth syllables. Furthermore, we propose a new benchmark task, Spoken Speech ABX, for evaluating sentence-level representation of speech. When compared to previous models, our model outperforms in both unsupervised syllable discovery and learning sentence-level representation. Together, we demonstrate that the self-distillation of HuBERT gives rise to syllabic organization without relying on external labels or modalities, and potentially provides novel data-driven units for spoken language modeling.

  • Coding Speech Through Vocal Tract Kinematics

    IEEE Journal of Selected Topics in Signal Processing · 2024-11-20 · 8 citations

    articleSenior author

    Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech – Speech Articulatory Coding (SPARC). SPARC comprises an articulatory analysis model that infers articulatory features from speech audio, and an articulatory synthesis model that synthesizes speech audio from articulatory features. The articulatory features are kinematic traces of vocal tract articulators and source features, which are intuitively interpretable and controllable, being the actual physical interface of speech production. An additional speaker identity encoder is jointly trained with the articulatory synthesizer to inform the voice texture of individual speakers. By training on large-scale speech data, we achieve a fully intelligible, high-quality articulatory synthesizer that generalizes to unseen speakers. Furthermore, the speaker embedding is effectively disentangled from articulations, which enables accent-perserving zero-shot voice conversion. To the best of our knowledge, this is the first demonstration of universal, high-performance articulatory inference and synthesis, suggesting the proposed framework as a powerful coding system of speech.

  • Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology

    2024-09-01 · 3 citations

    articleSenior author
  • Coding Speech through Vocal Tract Kinematics

    arXiv (Cornell University) · 2024-06-18 · 2 citations

    preprintOpen accessSenior author

    Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- Speech Articulatory Coding (SPARC). SPARC comprises an articulatory analysis model that infers articulatory features from speech audio, and an articulatory synthesis model that synthesizes speech audio from articulatory features. The articulatory features are kinematic traces of vocal tract articulators and source features, which are intuitively interpretable and controllable, being the actual physical interface of speech production. An additional speaker identity encoder is jointly trained with the articulatory synthesizer to inform the voice texture of individual speakers. By training on large-scale speech data, we achieve a fully intelligible, high-quality articulatory synthesizer that generalizes to unseen speakers. Furthermore, the speaker embedding is effectively disentangled from articulations, which enables accent-perserving zero-shot voice conversion. To the best of our knowledge, this is the first demonstration of universal, high-performance articulatory inference and synthesis, suggesting the proposed framework as a powerful coding system of speech.

Frequent coauthors

  • Edward F. Chang

    Neurological Surgery

    53 shared
  • Alan W. Black

    39 shared
  • Peter Wu

    39 shared
  • Cheol Jun Cho

    27 shared
  • Kaylo T. Littlejohn

    University of California, San Francisco

    23 shared
  • Josh Chartier

    Neurological Surgery

    20 shared
  • Jiachen Lian

    18 shared
  • Inga Zhuravleva

    Berkeley College

    17 shared

Education

  • Ph.D., Electrical Engineering and Computer Sciences

    University of California, Berkeley

    2005
  • M.S., Electrical Engineering and Computer Sciences

    University of California, Berkeley

    2001
  • B.S., Electrical Engineering

    Indian Institute of Technology, Madras

    1999

Awards & honors

  • Hellman Fellow (2023)
  • Google Faculty Research Award (2022)
  • Rose Hills Innovator Program (2021)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Gopala Krishna Anumanchipalli

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup