Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Miryung Kim

Miryung Kim

· Professor/Vice Chair for Graduate ProgramsVerified

University of California, Los Angeles · Computer Science

Active 2002–2026

h-index43
Citations7.2k
Papers15450 last 5y
Funding$3.1M1 active
See your match with Miryung Kim — sign in to PhdFit.Sign in

About

Miryung Kim is a Professor and Vice Chair for Graduate Programs in the Department of Computer Science at UCLA Samueli School of Engineering. Her research focuses on software engineering for data analytics (SE4DA), including debugging, testing, and optimization for big data systems, as well as software engineering for AI/ML based systems. She studies professional data scientists and their workflows, contributing to the understanding and advancement of software practices in data-intensive environments. Dr. Kim holds a PhD from the University of Washington, earned in 2008, and an MS from the same institution completed in 2003. Her notable contributions include influential publications in software maintenance, automated software engineering, and data analytics testing, which have earned her awards such as the Most Influential Paper Award at ICSME 2023, the ACM SIGSOFT Influential Educator Award in 2022, and recognition as an ACM Distinguished Member. She has also delivered distinguished lectures at prominent institutions and served as a program co-chair for major conferences. Her work has significantly impacted the fields of software engineering and data analytics, and she is recognized as a leading expert in her research areas.

Research topics

  • Computer Science
  • Data Mining
  • Artificial Intelligence
  • Programming language
  • Computer Security
  • Machine Learning
  • Engineering
  • Software engineering
  • Database
  • Reliability engineering
  • Distributed computing

Selected publications

  • ExplainFuzz: Explainable and Constraint-Conditioned Test Generation with Probabilistic Circuits

    arXiv (Cornell University) · 2026-04-08

    preprintOpen accessSenior author

    Understanding and explaining the structure of generated test inputs is essential for effective software testing and debugging. Existing approaches--including grammar-based fuzzers, probabilistic Context-Free Grammars (pCFGs), and Large Language Models (LLMs)--suffer from critical limitations. They frequently produce ill-formed inputs that fail to reflect realistic data distributions, struggle to capture context-sensitive probabilistic dependencies, and lack explainability. We introduce ExplainFuzz, a test generation framework that leverages Probabilistic Circuits (PCs) to learn and query structured distributions over grammar-based test inputs interpretably and controllably. Starting from a Context-Free Grammar (CFG), ExplainFuzz compiles a grammar-aware PC and trains it on existing inputs. New inputs are then generated via sampling. ExplainFuzz utilizes the conditioning capability of PCs to incorporate test-specific constraints (e.g., a query must have GROUP BY), enabling constrained probabilistic sampling to generate inputs satisfying grammar and user-provided constraints. Our results show that ExplainFuzz improves the coherence and realism of generated inputs, achieving significant perplexity reduction compared to pCFGs, grammar-unaware PCs, and LLMs. By leveraging its native conditioning capability, ExplainFuzz significantly enhances the diversity of inputs that satisfy a user-provided constraint. Compared to grammar-aware mutational fuzzing, ExplainFuzz increases bug-triggering rates from 35% to 63% in SQL and from 10% to 100% in XML. These results demonstrate the power of a learned input distribution over mutational fuzzing, which is often limited to exploring the local neighborhood of seed inputs. These capabilities highlight the potential of PCs to serve as a foundation for grammar-aware, controllable test generation that captures context-sensitive, probabilistic dependencies.

  • Change And Cover: Last-Mile, Pull Request-Based Regression Test Augmentation

    arXiv (Cornell University) · 2026-01-16

    preprintOpen access

    Software is in constant evolution, with developers frequently submitting pull requests (PRs) to introduce new features or fix bugs. Testing PRs is critical to maintaining software quality. Yet, even in projects with extensive test suites, some PR-modified lines remain untested, leaving a "last-mile" regression test gap. Existing test generators typically aim to improve overall coverage, but do not specifically target the uncovered lines in PRs. We present Change And Cover (ChaCo), an LLM-based test augmentation technique that addresses this gap. It makes three contributions: (i) ChaCo considers the PR-specific patch coverage, offering developers augmented tests for code just when it is on the developers' mind. (ii) We identify providing suitable test context as a crucial challenge for an LLM to generate useful tests, and present two techniques to extract relevant test content, such as existing test functions, fixtures, and data generators. (iii) To make augmented tests acceptable for developers, ChaCo carefully integrates them into the existing test suite, e.g., by matching the test's structure and style with the existing tests, and generates a summary of the test addition for developer review. We evaluate ChaCo on 145 PRs from three popular and complex open-source projects - SciPy, Qiskit, and Pandas. The approach successfully helps 30% of PRs achieve full patch coverage, at the cost of $0.11, showing its effectiveness and practicality. Human reviewers find the tests to be worth adding (4.53/5.0), well integrated (4.2/5.0), and relevant to the PR (4.7/5.0). Ablations show test context is crucial for context-aware test generation, leading to 2x coverage. We submitted 12 tests, of which 8 have already been merged, and two previously unknown bugs were exposed and fixed. We envision our approach to be integrated into CI workflows, automating the last mile of regression test augmentation.

  • Change And Cover: Last-Mile, Pull Request-Based Regression Test Augmentation

    ArXiv.org · 2026-01-16

    articleOpen access

    Software is in constant evolution, with developers frequently submitting pull requests (PRs) to introduce new features or fix bugs. Testing PRs is critical to maintaining software quality. Yet, even in projects with extensive test suites, some PR-modified lines remain untested, leaving a "last-mile" regression test gap. Existing test generators typically aim to improve overall coverage, but do not specifically target the uncovered lines in PRs. We present Change And Cover (ChaCo), an LLM-based test augmentation technique that addresses this gap. It makes three contributions: (i) ChaCo considers the PR-specific patch coverage, offering developers augmented tests for code just when it is on the developers' mind. (ii) We identify providing suitable test context as a crucial challenge for an LLM to generate useful tests, and present two techniques to extract relevant test content, such as existing test functions, fixtures, and data generators. (iii) To make augmented tests acceptable for developers, ChaCo carefully integrates them into the existing test suite, e.g., by matching the test's structure and style with the existing tests, and generates a summary of the test addition for developer review. We evaluate ChaCo on 145 PRs from three popular and complex open-source projects - SciPy, Qiskit, and Pandas. The approach successfully helps 30% of PRs achieve full patch coverage, at the cost of $0.11, showing its effectiveness and practicality. Human reviewers find the tests to be worth adding (4.53/5.0), well integrated (4.2/5.0), and relevant to the PR (4.7/5.0). Ablations show test context is crucial for context-aware test generation, leading to 2x coverage. We submitted 12 tests, of which 8 have already been merged, and two previously unknown bugs were exposed and fixed. We envision our approach to be integrated into CI workflows, automating the last mile of regression test augmentation.

  • Artifact for "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis"

    Zenodo (CERN European Organization for Nuclear Research) · 2026-01-14

    otherOpen accessSenior author

    This artifact accompanies the ICSE 2026 paper "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis." WhyFlow is an interrogative debugging tool for taint analysis that enables developers to ask why, why-not, and what-if questions about dataflows. Contents: - WhyFlow web application (Meteor-based) - Pre-computed CodeQL and Soufflé analysis results for Apache Dubbo - User study data and statistical analysis scripts - Docker environment for easy reproduction Reproduction tracks: - Track A: User study analysis (~10 minutes) - Track B: Running WhyFlow interactively (~30 minutes, includes Docker build) See replication/Experiment-Reproduction.md for detailed instructions.

  • ExplainFuzz: Explainable and Constraint-Conditioned Test Generation with Probabilistic Circuits

    arXiv (Cornell University) · 2026-04-08

    articleOpen accessSenior author

    Understanding and explaining the structure of generated test inputs is essential for effective software testing and debugging. Existing approaches--including grammar-based fuzzers, probabilistic Context-Free Grammars (pCFGs), and Large Language Models (LLMs)--suffer from critical limitations. They frequently produce ill-formed inputs that fail to reflect realistic data distributions, struggle to capture context-sensitive probabilistic dependencies, and lack explainability. We introduce ExplainFuzz, a test generation framework that leverages Probabilistic Circuits (PCs) to learn and query structured distributions over grammar-based test inputs interpretably and controllably. Starting from a Context-Free Grammar (CFG), ExplainFuzz compiles a grammar-aware PC and trains it on existing inputs. New inputs are then generated via sampling. ExplainFuzz utilizes the conditioning capability of PCs to incorporate test-specific constraints (e.g., a query must have GROUP BY), enabling constrained probabilistic sampling to generate inputs satisfying grammar and user-provided constraints. Our results show that ExplainFuzz improves the coherence and realism of generated inputs, achieving significant perplexity reduction compared to pCFGs, grammar-unaware PCs, and LLMs. By leveraging its native conditioning capability, ExplainFuzz significantly enhances the diversity of inputs that satisfy a user-provided constraint. Compared to grammar-aware mutational fuzzing, ExplainFuzz increases bug-triggering rates from 35% to 63% in SQL and from 10% to 100% in XML. These results demonstrate the power of a learned input distribution over mutational fuzzing, which is often limited to exploring the local neighborhood of seed inputs. These capabilities highlight the potential of PCs to serve as a foundation for grammar-aware, controllable test generation that captures context-sensitive, probabilistic dependencies.

  • Artifact for "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis"

    Zenodo (CERN European Organization for Nuclear Research) · 2026-01-14

    otherOpen accessSenior author

    This artifact accompanies the ICSE 2026 paper "WhyFlow: Interrogative Debugger for Sensemaking Taint Analysis." WhyFlow is an interrogative debugging tool for taint analysis that enables developers to ask why, why-not, and what-if questions about dataflows. Contents: - WhyFlow web application (Meteor-based) - Pre-computed CodeQL and Soufflé analysis results for Apache Dubbo - User study data and statistical analysis scripts - Docker environment for easy reproduction Reproduction tracks: - Track A: User study analysis (~10 minutes) - Track B: Running WhyFlow interactively (~30 minutes, includes Docker build) See replication/Experiment-Reproduction.md for detailed instructions.

  • Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions

    Proceedings of the ACM on software engineering. · 2025-06-19 · 1 citations

    articleOpen access

    Machine learning (ML) applications have become an integral part of our lives. ML applications extensively use floating-point computation and involve very large/small numbers; thus, maintaining the numerical stability of such complex computations remains an important challenge. Numerical bugs can lead to system crashes, incorrect output, and wasted computing resources. In this paper, we introduce a novel idea, namely soft assertions (SA) , to encode safety/error conditions for the places where numerical instability can occur. A soft assertion is an ML model automatically trained using the dataset obtained during unit testing of unstable functions. Given the values at the unstable function in an ML application, a soft assertion reports how to change these values in order to trigger the instability. We then use the output of soft assertions as signals to effectively mutate inputs to trigger numerical instability in ML applications. In the evaluation, we used the GRIST benchmark, a total of 79 programs, as well as 15 real-world ML applications from GitHub. We compared our tool with 5 state-of-the-art (SOTA) fuzzers. We found all the GRIST bugs and outperformed the baselines. We found 13 numerical bugs in real-world code, one of which had already been confirmed by the GitHub developers. While the baselines mostly found the bugs that report NaN and INF, our tool found numerical bugs with incorrect output. We showed one case where the Tumor Detection Model , trained on Brain MRI images, should have predicted ”tumor”, but instead, it incorrectly predicted ”no tumor” due to the numerical bugs. Our replication package is located at https://figshare.com/s/6528d21ccd28bea94c32.

  • Chrysalis: A Lightweight Logging and Replay Framework for Metamorphic Testing in Python

    2025-11-16

    articleSenior author

    Metamorphic testing (MT) is a powerful technique for software testing. We introduce Chrysalis, a lightweight, extensible logging and replay-based metamorphic testing framework in Python. Chrysalis allows developers to define custom input transformations and their associated invariants, then execute structured metamorphic testing campaigns. Its key innovation is a lightweight logging mechanism that records the full history of transformations applied to an input. This compact representation enables developers to not only identify test failures but also to replay the exact sequence of transformations leading to a bug, facilitating debugging. We demonstrate Chrysalis’s effectiveness through two case studies: auditing a machine learning model for fairness and assessing the robustness of large language models.A screencast demonstrating Chrysalis is available at: https://youtu.be/xJG4qghxlIs, and the source code is available at: https://github.com/Chrysalis-Test/Chrysalis.

  • PALM: Path-aware LLM-based Test Generation with Comprehension

    arXiv (Cornell University) · 2025-06-24

    preprintOpen accessSenior author

    Symbolic execution is a widely used technique for test generation, offering systematic exploration of program paths through constraint solving. However, it is fundamentally constrained by the capability to model the target code, including library functions, in terms of symbolic constraints and by the capability of underlying constraint solvers. As a result, many paths involving complex features remain unanalyzed or insufficiently modeled. Recent advances in large language models (LLMs) have shown promise in generating diverse and valid test inputs. Yet, LLMs lack mechanisms for systematically enumerating program paths and often fail to cover subtle corner cases. We observe that directly prompting an LLM with the full program leads to missed coverage of interesting paths. In this paper, we present PALM, a test generation system that combines symbolic path enumeration with LLM-assisted test generation. PALM statically enumerates possible paths through AST-level analysis and transforms each into an executable variant with embedded assertions that specify the target path. This avoids the need to translate path constraints into SMT formulas, by instead constructing program variants that the LLM can interpret. Importantly, PALM provides an interactive frontend that visualizes path coverage alongside generated tests, assembling tests based on the specific paths they exercise. A user study with 12 participants demonstrates that PALM's frontend helps users better understand path coverage and identify which paths are actually exercised by PALM-generated tests through verification and visualization of their path profiles.

  • From Noise to Knowledge: Interactive Summaries for Developer Alerts

    ArXiv.org · 2025-08-10

    preprintOpen accessSenior author

    Programmers using bug-finding tools often review their reported warnings one by one. Based on the insight that identifying recurring themes and relationships can enhance the cognitive process of sensemaking, we propose CLARITY, which supports interpreting tool-generated warnings through interactive inquiry. CLARITY derives summary rules for custom grouping of related warnings with active feedback. As users mark warnings as interesting or uninteresting, CLARITY's rule inference algorithm surfaces common symptoms, highlighting structural similarities in containment, subtyping, invoked methods, accessed fields, and expressions. We demonstrate CLARITY on Infer and SpotBugs warnings across two mature Java projects. In a within-subject user study with 14 participants, users articulated root causes for similar uninteresting warnings faster and with more confidence using CLARITY. We observed significant individual variation in desired grouping, reinforcing the need for customizable sensemaking. Simulation shows that with rule-level feedback, only 11.8 interactions are needed on average to align all inferred rules with a simulated user's labels (vs. 17.8 without). Our evaluation suggests that CLARITY's active learning-based summarization enhances interactive warning sensemaking.

Recent grants

Frequent coauthors

  • Muhammad Ali Gulzar

    28 shared
  • Guoqing Xu

    University of California, Los Angeles

    17 shared
  • David Notkin

    University of Washington

    15 shared
  • Tianyi Zhang

    Purdue University System

    14 shared
  • Na Meng

    12 shared
  • Xi Zheng

    10 shared
  • Thomas Zimmermann

    10 shared
  • Tyson Condie

    9 shared

Education

  • Phd, Computer Science and Engineering

    University of Washington

    2008
  • MS, Computer Science and Engineering

    University of Washington

    2003

Awards & honors

  • ICSME Most Influential Paper Award (2023)
  • Distinguished Lecture at Carnegie Mellon University (2023)
  • ACM SIGSOFT Influential Educator Award (2022)
  • Distinguished Lecture at UIUC (2021)
  • ACM Distinguished Member Program (2022)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Miryung Kim

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup