Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Muhammad Ali Gulzar

Muhammad Ali Gulzar

· Assistant ProfessorVerified

Virginia Tech · Computer Science

Active 2015–2026

h-index12
Citations721
Papers5236 last 5y
Funding$324k
See your match with Muhammad Ali Gulzar — sign in to PhdFit.Sign in

About

Muhammad Ali Gulzar is an Assistant Professor in the Department of Computer Science at Virginia Tech. He completed his Ph.D. in computer science from the University of California, Los Angeles in 2020. His educational background also includes a B.S. in computer science from Lahore University of Management Sciences, Pakistan, obtained in 2014. His research interests focus on software engineering, specifically debugging and testing emerging software. He is based at the Gilbert Place location in Blacksburg, VA, and is involved in research activities within the Institute for Advanced Computing. His contact information includes an email address at gulzar@vt.edu and a phone number (540) 231-0851.

Research topics

  • Data Mining
  • Computer Science
  • Machine Learning
  • Programming language
  • Artificial Intelligence
  • Reliability engineering
  • Database
  • Engineering

Selected publications

  • TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report)

    arXiv (Cornell University) · 2026-03-19

    preprintOpen access

    Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular control flow for various compressed storage formats directly from high-level declarative specifications, thereby making them highly susceptible to subtle correctness defects. Existing testing frameworks, which rely on mutating computation graphs restricted to a standard vocabulary of operators, fail to exercise the arbitrary loop synthesis capabilities of these compilers. Furthermore, generic grammar-based fuzzers struggle to generate valid inputs due to the strict rules governing how indices are reused across multiple tensors. In this paper, we present TENSURE, the first extensible black-box fuzzing framework specifically designed for the testing of STCs. TENSURE leverages Einstein Summation (Einsum) notation as a general input abstraction, enabling the generation of complex, unconventional tensor contractions that expose corner cases in the code-generation phases of STCs. We propose a novel constraint-based generation algorithm that guarantees 100% semantic validity of synthesized kernels, significantly outperforming the ~3.3% validity rate of baseline grammar fuzzers. To enable metamorphic testing without a trusted reference, we introduce a set of semantic-preserving mutation operators that exploit algebraic commutativity and heterogeneity in storage formats. Our evaluation on two state-of-the-art systems, TACO and Finch, reveals widespread fragility, particularly in TACO, where TENSURE exposed crashes or silent miscompilations in a majority of generated test cases. These findings underscore the critical need for specialized testing tools in the sparse compilation ecosystem.

  • Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models

    Zenodo (CERN European Organization for Nuclear Research) · 2026-02-27

    articleOpen accessSenior author
  • Evaluating LLM-Based Test Generation Under Software Evolution

    ArXiv.org · 2026-03-24

    articleOpen accessSenior author

    Large Language Models (LLMs) are increasingly used for automated unit test generation. However, it remains unclear whether these tests reflect genuine reasoning about program behavior or simply reproduce superficial patterns learned during training. If the latter dominates, LLM-generated tests may exhibit weaknesses such as reduced coverage, missed regressions, and undetected faults. Understanding how LLMs generate tests and how those tests respond to code evolution is therefore essential. We present a large-scale empirical study of LLM-based test generation under program changes. Using an automated mutation-driven framework, we analyze how generated tests react to semantic-altering changes (SAC) and semantic-preserving changes (SPC) across eight LLMs and 22,374 program variants. LLMs achieve strong baseline results, reaching 79% line coverage and 76% branch coverage with fully passing test suites on the original programs. However, performance degrades as programs evolve. Under SACs, the pass rate of newly generated tests drops to 66%, and branch coverage declines to 60%. More than 99% of failing SAC tests pass on the original program while executing the modified region, indicating residual alignment with the original behavior rather than adaptation to updated semantics. Performance also declines under SPCs despite unchanged functionality: pass rates fall to 79% and branch coverage to 69%. Although SPC edits preserve semantics, they often introduce larger syntactic changes, leading to instability in generated test suites. Models generate more new tests while discarding many baseline tests, suggesting sensitivity to lexical changes rather than true semantic impact. Overall, our results indicate that current LLM-based test generation relies heavily on surface-level cues and struggles to maintain regression awareness as programs evolve.

  • ProToken: Token-Level Attribution for Federated Large Language Models

    ArXiv.org · 2026-01-27

    articleOpen accessSenior author

    Federated Learning (FL) enables collaborative training of Large Language Models (LLMs) across distributed data sources while preserving privacy. However, when federated LLMs are deployed in critical applications, it remains unclear which client(s) contributed to specific generated responses, hindering debugging, malicious client identification, fair reward allocation, and trust verification. We present ProToken, a novel Provenance methodology for Token-level attribution in federated LLMs that addresses client attribution during autoregressive text generation while maintaining FL privacy constraints. ProToken leverages two key insights to enable provenance at each token: (1) transformer architectures concentrate task-specific signals in later blocks, enabling strategic layer selection for computational tractability, and (2) gradient-based relevance weighting filters out irrelevant neural activations, focusing attribution on neurons that directly influence token generation. We evaluate ProToken across 16 configurations spanning four LLM architectures (Gemma, Llama, Qwen, SmolLM) and four domains (medical, financial, mathematical, coding). ProToken achieves 98% average attribution accuracy in correctly localizing responsible client(s), and maintains high accuracy when the number of clients are scaled, validating its practical viability for real-world deployment settings.

  • Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models

    Zenodo (CERN European Organization for Nuclear Research) · 2026-02-27

    articleOpen accessSenior author
  • Evaluating LLM-Based Test Generation Under Code Changes

    Zenodo (CERN European Organization for Nuclear Research) · 2026-03-07

    articleOpen accessSenior author
  • Evaluating LLM-Based Test Generation Under Code Changes

    Open MIND · 2026-03-07

    articleSenior author
  • Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models

    Zenodo (CERN European Organization for Nuclear Research) · 2026-02-27

    articleOpen accessSenior author
  • TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report)

    ArXiv.org · 2026-03-19

    articleOpen access

    Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular control flow for various compressed storage formats directly from high-level declarative specifications, thereby making them highly susceptible to subtle correctness defects. Existing testing frameworks, which rely on mutating computation graphs restricted to a standard vocabulary of operators, fail to exercise the arbitrary loop synthesis capabilities of these compilers. Furthermore, generic grammar-based fuzzers struggle to generate valid inputs due to the strict rules governing how indices are reused across multiple tensors. In this paper, we present TENSURE, the first extensible black-box fuzzing framework specifically designed for the testing of STCs. TENSURE leverages Einstein Summation (Einsum) notation as a general input abstraction, enabling the generation of complex, unconventional tensor contractions that expose corner cases in the code-generation phases of STCs. We propose a novel constraint-based generation algorithm that guarantees 100% semantic validity of synthesized kernels, significantly outperforming the ~3.3% validity rate of baseline grammar fuzzers. To enable metamorphic testing without a trusted reference, we introduce a set of semantic-preserving mutation operators that exploit algebraic commutativity and heterogeneity in storage formats. Our evaluation on two state-of-the-art systems, TACO and Finch, reveals widespread fragility, particularly in TACO, where TENSURE exposed crashes or silent miscompilations in a majority of generated test cases. These findings underscore the critical need for specialized testing tools in the sparse compilation ecosystem.

  • ProToken: Token-Level Attribution for Federated Large Language Models

    arXiv (Cornell University) · 2026-01-27

    preprintOpen accessSenior author

    Federated Learning (FL) enables collaborative training of Large Language Models (LLMs) across distributed data sources while preserving privacy. However, when federated LLMs are deployed in critical applications, it remains unclear which client(s) contributed to specific generated responses, hindering debugging, malicious client identification, fair reward allocation, and trust verification. We present ProToken, a novel Provenance methodology for Token-level attribution in federated LLMs that addresses client attribution during autoregressive text generation while maintaining FL privacy constraints. ProToken leverages two key insights to enable provenance at each token: (1) transformer architectures concentrate task-specific signals in later blocks, enabling strategic layer selection for computational tractability, and (2) gradient-based relevance weighting filters out irrelevant neural activations, focusing attribution on neurons that directly influence token generation. We evaluate ProToken across 16 configurations spanning four LLM architectures (Gemma, Llama, Qwen, SmolLM) and four domains (medical, financial, mathematical, coding). ProToken achieves 98% average attribution accuracy in correctly localizing responsible client(s), and maintains high accuracy when the number of clients are scaled, validating its practical viability for real-world deployment settings.

Recent grants

Frequent coauthors

  • Miryung Kim

    28 shared
  • Ali Anwar

    12 shared
  • Abdul Haddi Amjad

    Virginia Tech

    11 shared
  • Matteo Interlandi

    Microsoft (United States)

    9 shared
  • Waris Gill

    9 shared
  • Tyson Condie

    9 shared
  • Zubair Shafiq

    9 shared
  • Sai Deep Tetali

    META Health

    7 shared

Education

  • Ph.D, Computer Science

    University of California at Los Angeles

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Muhammad Ali Gulzar

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup