Michael Ernst
· ProfessorVerifiedUniversity of Washington · Computer Science & Engineering
Active 1975–2026
About
Michael Ernst is a professor who leads a research group composed of talented students, postdocs, and staff focused on program understanding and analysis research projects. He actively seeks bright and motivated individuals at various academic levels, including undergraduate, master's, and PhD students, to contribute to his research endeavors. His research interests encompass a variety of topics related to program understanding and analysis, as indicated by his list of projects and research interests. Ernst's group includes current PhD students, undergraduates, research staff, and a significant number of graduated alumni who have completed theses and dissertations under his supervision. These alumni have pursued careers in academia and industry, including positions at universities, technology companies such as Google, Amazon, Microsoft, and Apple, and research institutions. The diversity and success of his students and collaborators reflect his commitment to advancing research in software testing, program analysis, verification, and related areas.
Research topics
- Computer Science
- Mathematics
- Statistics
- Artificial Intelligence
- Engineering
- Machine Learning
- Reliability engineering
- Data Mining
- Programming language
- Real-time computing
- Distributed computing
- Theoretical computer science
Selected publications
Do LLMs Generate Useful Test Oracles? An Empirical Study with an Unbiased Dataset
Gesellschaft für Informatik (GI) · 2026-01-01
articleOpen accessIn this article, we summarize our paper Do LLMs Generate Useful Test Oracles? An Empirical Study with an Unbiased Dataset [Mo25], which has been recently accepted for presentation at the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE).
Test Oracle Generation for REST APIs
ACM Transactions on Software Engineering and Methodology · 2025-03-28 · 2 citations
articleOpen accessThe number and complexity of test case generation tools for REST APIs have significantly increased in recent years. These tools excel in automating input generation but are limited by their test oracles, which can only detect crashes, regressions, and violations of API specifications or design best practices. This article introduces AGORA+, an approach for generating test oracles for REST APIs through the detection of invariants—output properties that should always hold. AGORA+ learns the expected behavior of an API by analyzing API requests and their corresponding responses. We enhanced the Daikon tool for dynamic detection of likely invariants, adding new invariant types and creating a front-end called Beet. Beet translates any OpenAPI specification and a set of API requests and responses into Daikon inputs. AGORA+ can detect 106 different types of invariants in REST APIs. We also developed PostmanAssertify, which converts the invariants identified by AGORA+ into executable JavaScript assertions. AGORA+ achieved a precision of 80% on 25 operations from 20 industrial APIs. It also identified 48% of errors systematically seeded in the outputs of the APIs under test. AGORA+ uncovered 32 bugs in popular APIs, including Amadeus, Deutschebahn, GitHub, Marvel, NYTimesBooks, and YouTube, leading to fixes and documentation updates.
Resolving Conditional Implicit Calls to Improve Static and Dynamic Analysis in Android Apps
ACM Transactions on Software Engineering and Methodology · 2025-04-17
articleOpen accessAn implicit call is a mechanism that triggers the execution of a method m without a direct call to m in the code being analyzed. For instance, in Android apps the Thread.start() method implicitly executes the Thread.run() method. These implicit calls can be conditionally triggered by programmer-specified constraints that are evaluated at runtime. For instance, the JobScheduler.schedule() method can be called to implicitly execute the JobService.onStartJob() method only if the device’s battery is charging. Such conditional implicit calls can effectively disguise logic bombs , posing significant challenges for both static and dynamic software analyses. Conservative static analysis may produce false-positive alerts due to over-approximation, while less conservative approaches might overlook potential covert behaviors, a serious concern in security analysis. Dynamic analysis may fail to generate the specific inputs required to activate these implicit call targets. To address these challenges, we introduce Archer, a tool designed to resolve conditional implicit calls and extract the constraints triggering execution control transfer. Our evaluation reveals that ① implicit calls are prevalent in Android apps; ② Archer enhances app models’ soundness beyond existing static analysis methods; and ③ Archer successfully infers constraint values, enabling dynamic analyzers to detect (i.e., thanks to better code coverage) and assess conditionally triggered implicit calls.
Do LLMs Generate Useful Test Oracles? An Empirical Study with an Unbiased Dataset
2025-11-16
articleGeneration of thorough test oracles is an open problem. Popular test case generators, like EvoSuite and Randoop, rely on implicit, rule-based, and regression oracles that miss failures that depend on the semantics of the program under test. Formal specifications can yield test oracles but are expensive to create.Large Language Models (LLMs) have the potential to overcome these limitations. The few studies of using LLMs to generate test oracles use modest-sized public benchmarks, such as Defects4J, that are likely to be included in the LLM training data, which threatens the validity of the results.This paper presents an empirical study of the effectiveness of LLMs in generating test oracles. Our experiments use 13,866 test oracles, from 135 Java projects, that were created after the LLMs training cut-off dates. Thus, our dataset is unbiased.In our experiments, LLMs generated oracles with average mutation score of 43%—similar to the 45% score of human-designed test oracles. Our results also indicate that the test prefix and the methods called in the program under test provide sufficient information to generate good oracles, while additional code context does not bring relevant benefits. These findings provide actionable insights into using LLMs for automatic testing and highlight their current limitations in generating complex oracles.
Test Oracle Generation for REST APIs - RCR Report
ACM Transactions on Software Engineering and Methodology · 2025-10-14
articleThis Replicated Computational Results (RCR) Report accompanies our TOSEM paper “Test Oracle Generation for REST APIs”. In this work we introduce AGORA+, a black-box approach for automatically generating domain-specific test oracles for REST APIs by detecting invariants—output properties that should always hold. As part of this RCR, we provide a replication package (available at https://doi.org/10.5281/zenodo.12506791 ) that enables the full reproduction of our results and is designed to pave the way for future research.
Tratto: A Neuro-Symbolic Approach to Deriving Axiomatic Test Oracles
Proceedings of the ACM on software engineering. · 2025-06-22 · 3 citations
articleThis paper presents Tratto, a neuro-symbolic approach that generates assertions (boolean expressions) that can serve as axiomatic oracles, from source code and documentation. The symbolic module of Tratto takes advantage of the grammar of the programming language, the unit under test, and the context of the unit (its class and available APIs) to restrict the search space of the tokens that can be successfully used to generate valid oracles. The neural module of Tratto uses transformers fine-tuned for both deciding whether to output an oracle or not and selecting the next lexical token to incrementally build the oracle from the set of tokens returned by the symbolic module. Our experiments show that Tratto outperforms the state-of-the-art axiomatic oracle generation approaches, with 73% accuracy, 72% precision, and 61% F1-score, largely higher than the best results of the symbolic and neural approaches considered in our study (61%, 62%, and 37%, respectively). Tratto can generate three times more axiomatic oracles than current symbolic approaches, while generating 10 times less false positives than GPT4 complemented with few-shot learning and Chain-of-Thought prompting.
Lightweight and modular resource leak checking (extended version)
International Journal on Software Tools for Technology Transfer · 2025-04-01 · 2 citations
articleRepairing Leaks in Resource Wrappers
2025-11-16
articleA resource leak occurs when a program fails to release a finite resource like a socket, file descriptor or database connection. While sound static analysis tools can detect all leaks, automatically repairing them remains challenging. Prior work took the output of a detection tool and attempted to repair only leaks from a hard-coded list of library resource types. That approach limits the scope of repairable leaks: real-world code uses resource wrappers that store a resource in a field and must themselves be closed.This paper makes four key contributions to improve resource leak repair in the presence of wrappers. (1) It integrates inference of resource management specifications into the repair pipeline, enabling extant fixing approaches to reason about wrappers. (2) It transforms programs into variants that are easier to analyze, making inference, detection, and fixing tools more effective; for instance, it makes detection tools report problems closer to the root cause, often in a client of a resource wrapper rather than within the wrapper class itself. (3) A novel field containment analysis reasons about resource lifetimes, enabling repair of more leaks involving resources stored in fields. (4) It introduces a new repair pattern and more precise reasoning to better handle resources stored in non-final fields.Prior work fixed 41% of resource leak warnings in the NJR benchmark suite; our implementation Arodnap fixes 68%.
Tratto: A Neuro-Symbolic Approach to Deriving Axiomatic Test Oracles
ArXiv.org · 2025-04-05
preprintOpen accessThis paper presents Tratto, a neuro-symbolic approach that generates assertions (boolean expressions) that can serve as axiomatic oracles, from source code and documentation. The symbolic module of Tratto takes advantage of the grammar of the programming language, the unit under test, and the context of the unit (its class and available APIs) to restrict the search space of the tokens that can be successfully used to generate valid oracles. The neural module of Tratto uses transformers fine-tuned for both deciding whether to output an oracle or not and selecting the next lexical token to incrementally build the oracle from the set of tokens returned by the symbolic module. Our experiments show that Tratto outperforms the state-of-the-art axiomatic oracle generation approaches, with 73% accuracy, 72% precision, and 61% F1-score, largely higher than the best results of the symbolic and neural approaches considered in our study (61%, 62%, and 37%, respectively). Tratto can generate three times more axiomatic oracles than current symbolic approaches, while generating 10 times less false positives than GPT4 complemented with few-shot learning and Chain-of-Thought prompting.
Repairing Leaks in Resource Wrappers
ArXiv.org · 2025-10-03
preprintOpen accessA resource leak occurs when a program fails to release a finite resource like a socket, file descriptor or database connection. While sound static analysis tools can detect all leaks, automatically repairing them remains challenging. Prior work took the output of a detection tool and attempted to repair only leaks from a hard-coded list of library resource types. That approach limits the scope of repairable leaks: real-world code uses resource wrappers that store a resource in a field and must themselves be closed. This paper makes four key contributions to improve resource leak repair in the presence of wrappers. (1) It integrates inference of resource management specifications into the repair pipeline, enabling extant fixing approaches to reason about wrappers. (2) It transforms programs into variants that are easier to analyze, making inference, detection, and fixing tools more effective; for instance, it makes detection tools report problems closer to the root cause, often in a client of a resource wrapper rather than within the wrapper class itself. (3) A novel field containment analysis reasons about resource lifetimes, enabling repair of more leaks involving resources stored in fields. (4) It introduces a new repair pattern and more precise reasoning to better handle resources stored in non-final fields. Prior work fixed 41% of resource leak warnings in the NJR benchmark suite; our implementation Arodnap fixes 68%.
Recent grants
CAREER: Automatically Generating Specifications to Improve Program Correctness and Maintainability
NSF · $300k · 2002–2007
II-NEW: Practical Pluggable Type Systems
NSF · $681k · 2009–2013
FMitF: Formal Verification of Accessibility
NSF · $786k · 2019–2022
SoD-HCER: Testing Designs and Designing Tests
NSF · $200k · 2006–2008
SHF: Small: Always-On Static and Dynamic Feedback
NSF · $513k · 2010–2014
Frequent coauthors
- 30 shared
Yuriy Brun
- 24 shared
Patricia J. Mergo
- 23 shared
René Just
Google (United States)
- 22 shared
Koenraad J. Mortelé
Ghent University Hospital
- 22 shared
Pablo R. Ros
Stony Brook University
- 21 shared
David Notkin
University of Washington
- 20 shared
Zachary Tatlock
University of Washington
- 20 shared
Helena M. Taylor
Florida College
Labs
Program understanding and analysis research projects
Education
- 2002
Ph.D., Computer Science
Massachusetts Institute of Technology (MIT)
- 1999
M.S., Computer Science
Massachusetts Institute of Technology (MIT)
- 1995
B.S., Computer Science
University of California, Berkeley
Awards & honors
- ACM Fellow (2014)
- IEEE Fellow (2021)
- CRA-E Undergraduate Mentoring Award (2018)
- John Backus Award (2009)
- NSF CAREER Award (2002)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Michael Ernst
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup