Darko Marinov
· ProfessorVerifiedUniversity of Illinois Urbana-Champaign · Computer Science
Active 1999–2026
About
Darko Marinov is a professor at the University of Illinois Urbana-Champaign in the Department of Computer Science, affiliated with the Siebel School of Computing and Data Science. He earned his Ph.D. in Computer Science from the Massachusetts Institute of Technology in 2005. His research areas include programming languages, formal methods, and software engineering. Marinov has taught courses such as Software Engineering I and II, Topics in Software Engineering, and PhD orientation seminars. His recent work has been recognized with awards such as the Test of Time award at FSE 2024, and he has contributed to projects related to aviation certification for the Linux kernel and configuration management for cloud computing. Marinov is actively involved in mentoring students and advancing research in his field.
Research topics
- Computer Science
- Machine Learning
- Engineering
- Programming language
- Reliability engineering
- Software engineering
- Operating system
- Telecommunications
- Data science
Selected publications
2026-04-12
articleOpen accessSenior authorFlaky tests are an important hindrance in practical software development. Researchers have developed many automated techniques for detecting, mitigating, and fixing flaky tests. Such new techniques are often inspired by studying past fixed flaky tests. Several papers present categorizations of flaky tests. However, such categorizations were obtained mostly manually, by researchers reading the fixes (code changes, commit messages, pull request discussions, etc.) and grouping the tests based on their likely root cause.
FastFlip: Compositional SDC Resiliency Analysis
2025-02-22 · 2 citations
articleOpen accessTo efficiently harden programs susceptible to Silent Data Corruptions (SDCs), developers need to invoke error injection analyses to find particularly vulnerable instructions and then selectively protect them using appropriate compiler-level SDC detection mechanisms. However, these error injection analyses are both expensive and monolithic: they must be run from scratch after even small changes to the code, such as optimizations or bug fixes. This high recurring cost keeps such software-directed resiliency analyses out of standard software engineering practices such as regression testing. We present FastFlip, the first approach tailored to seamlessly incorporate resiliency analysis within the iterative software development workflow. FastFlip combines empirical error injection and symbolic SDC propagation analyses to enable fast and compositional error injection analysis of evolving programs. When developers modify a program, FastFlip often has to re-analyze only the modified program sections, which can save a significant amount of analysis time. We evaluated FastFlip with five benchmark programs. In our experiments, for each benchmark, we analyzed the original version plus two modified versions. The compositional nature of FastFlip speeds up the analysis of the incrementally modified versions by 3.2× (geomean) and up to 17.2×. The results demonstrate that FastFlip can effectively select a set of instructions to protect against SDCs that minimizes the runtime protection cost while protecting against a developer-specified target fraction of all tested SDC-causing errors.
pytest-ranking: A Regression Test Prioritization Tool for Python
2025-06-23
articleOpen accessSenior authorRegression Test Prioritization (RTP) can find test failures quicker and provide faster feedback to developers to help in debugging. While RTP has been researched for almost three decades, with many research techniques proposed, practical tools and evaluations are sporadic. We present pytest-ranking, a robust tool for Python and its most popular testing framework Pytest. We evaluate our tool on 4,308 builds for 14 open-source Python projects running on the GitHub Actions CI. Our experiments show that our tool integrates well with the Pytest ecosystem, has a low runtime overhead, and finds test failures faster than the default and random order baselines.
Evaluating NonDex for Modern Java Ecosystem
2025-04-27
articleSenior authorNonDex is a testing approach designed to unveil implementation-dependent (ID) flaky tests stemming from in-correctly relying on a deterministic implementation of a Java API with an underdetermined specification, e.g., iterating over elements of a HashSet object. Since the original Nondex work was published in 2016, we have enhanced the tool functionality and expanded its integration with recent Java versions and build tools like Maven and Gradle. This evolution enables Nondex to analyze a broader range of large, open-source Java projects. This paper investigates our updated Nondex on modern Java projects. We identified 734 ID flaky tests in 31 Maven projects and 267 ID flaky tests in 25 Gradle projects. Comparing these findings to prior work, this study highlights an increase for a modern Java project to contain some ID flaky test(s). We also studied the propagation of ID flakiness through project dependencies and fixed a key non-determinism issue in the Gradle build system itself. Our study emphasizes the importance of proactively employing NOndexto detect and fix flaky tests, preventing potential disruptions in ongoing and future projects. We put all our results at https://github.com/NonDexFTW/NonDex-Experiments.
DebCovDiff: Differential Testing of Coverage Measurement Tools on Real-World Projects
2025-11-16
articleMeasuring code coverage is a critical practice in software testing. Incorrect or misleading coverage information reported by automatic tools can increase the software development cost and lead to negative consequences especially for safety-critical software. Ensuring the correctness of coverage measurement tools is therefore important. Prior studies have applied various techniques to find bugs in Gcov and LLVM-cov, the two most widely used coverage tools for C/C++. However, those studies had two limiting factors. First, they used only small, often synthetic, programs, potentially missing bugs in real-world scenarios. Second, they focused only on basic line coverage, neglecting advanced metrics that are both more complex to implement and commonly required for safety-critical software.This paper presents the first empirical study of coverage measurement tools for real-world projects. We implement DebCovDiff, a testing framework that takes Debian packages as the input programs and performs differential testing of Gcov and LLVM-cov, for line coverage and two advanced coverage metrics. We design robust differential oracles to (1) filter out discrepancies arising from subtle differences in the tool output presentation, (2) overcome the nondeterministic nature of certain packages, and (3) support advanced coverage metrics. From results on 47 Debian packages, we identify 34 new bugs, including 2 crashing bugs and 32 deeper bugs that produce wrong coverage reports.
Transforming the Hybrid Cloud for Emerging AI Workloads
arXiv (Cornell University) · 2024-11-20 · 2 citations
preprintOpen accessThis white paper, developed through close collaboration between IBM Research and UIUC researchers within the IIDAI Institute, envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads through innovative, full-stack co-design approaches, emphasizing usability, manageability, affordability, adaptability, efficiency, and scalability. By integrating cutting-edge technologies such as generative and agentic AI, cross-layer automation and optimization, unified control plane, and composable and adaptive system architecture, the proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. Incorporating quantum computing as it matures will enable quantum-accelerated simulations for materials science, climate modeling, and other high-impact domains. Collaborative efforts between academia and industry are central to this vision, driving advancements in foundation models for material design and climate solutions, scalable multimodal data processing, and enhanced physics-based AI emulators for applications like weather forecasting and carbon sequestration. Research priorities include advancing AI agentic systems, LLM as an Abstraction (LLMaaA), AI model optimization and unified abstractions across heterogeneous infrastructure, end-to-end edge-cloud transformation, efficient programming model, middleware and platform, secure infrastructure, application-adaptive cloud systems, and new quantum-classical collaborative workflows. These ideas and solutions encompass both theoretical and practical research questions, requiring coordinated input and support from the research community. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms, fostering breakthroughs in AI-driven applications and scientific discovery across academia, industry, and society.
Ctest4J: A Practical Configuration Testing Framework for Java
2024-07-10 · 2 citations
articleOpen accessWe present Ctest4J, a practical configuration testing framework for Java projects. Configuration testing is a recently proposed approach for finding both misconfigurations and code bugs. Ctest4J addresses the limitations of configuration testing scripts from prior work, including lack of parallel test execution, poor maintainability due to external dependencies, limited integration with modern build systems, and the need for manual instrumentation of configuration API. Ctest4J is a unified framework to write, maintain, and execute configuration tests (Ctests) and integrates with multiple testing frameworks (JUnit4, JUnit5, and TestNG) and build systems (Maven and Gradle). With Ctest4J, Ctests can be maintained similarly to regular unit tests. Ctest4J also provides a utility for automated code instrumentation for common configuration API. We evaluate Ctest4J on 12 open-source projects. We show that Ctest4J effectively enables configuration testing for these projects and speeds up Ctest execution by 3.4X compared to prior scripts. Ctest4J can be found at https://github.com/xlab-uiuc/ctest4j.
FastFlip: Compositional Error Injection Analysis
arXiv (Cornell University) · 2024-03-20
preprintOpen accessInstruction-level error injection analyses aim to find instructions where errors often lead to unacceptable outcomes like Silent Data Corruptions (SDCs). These analyses require significant time, which is especially problematic if developers wish to regularly analyze software that evolves over time. We present FastFlip, a combination of empirical error injection and symbolic SDC propagation analyses that enables fast, compositional error injection analysis of evolving programs. FastFlip calculates how SDCs propagate across program sections and correctly accounts for unexpected side effects that can occur due to errors. Using FastFlip, we analyze five benchmarks, plus two modified versions of each benchmark. FastFlip speeds up the analysis of incrementally modified programs by $3.2\times$ (geomean). FastFlip selects a set of instructions to protect against SDCs that minimizes the runtime cost of protection while protecting against a developer-specified target fraction of all SDC-causing errors.
Hierarchy-Aware Regression Test Prioritization
2024-10-28 · 2 citations
articleRegression testing is widely used to check whether software changes lead to test failures. Regression Test Prioriti-zation (RTP) aims to order tests such that tests that are more likely to fail are run earlier. Prior RTP techniques—which we call hierarchy-unaware (HU)—ignored an important aspect: real test suites are organized hierarchically, and individual tests belong to composites that can be hierarchically nested. Prior RTP work overlooked the runtime cost to switch across hierarchical test compositesand used the APFD<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</inf> metric, which represents the runtime of tests till test failures, to rank orders generated by RTP techniques. However, APFD<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</inf> can misleadingly rank orders if their runtimes differ (e.g., two orders may have different numbers of composite switches and, consequently, runtimes). To account for runtime differences, we propose a new metric, HAPFD<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</inf>. Unlike APFD<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</inf>, HAPFD<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</inf> enables proper comparison of test orders with different runtimes by "extending" runtimes as needed. To reduce the cost of composite switching, we introduce hierarchy-aware (HA) RTP by presenting meta-techniques that first prioritize composites and then tests within composites. We evaluate HA RTP on test classes in multi-module Java and Maven projects from two large datasets used in prior work. The results show that our HA RTP improves both HAPFD<inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">c</inf> values and time-based metrics over HU RTP.
A benchmark suite and performance analysis of user-space provenance collectors
2024-06-18 · 2 citations
articleOpen accessSenior authorComputational provenance has many important applications, especially to reproducibility. System-level provenance collectors can track provenance data without requiring the user to change anything about their application. However, system-level provenance collectors have performance overheads, and, worse still, different works use different and incomparable benchmarks to assess their performance overhead. This work identifies user-space system-level provenance collectors in prior work, collates the benchmarks, and evaluates each collector on each benchmark. We use benchmark minimization to select a minimal subset of benchmarks, which can be used as goalposts for future work on system-level provenance collectors.
Recent grants
CPS: Synergy: Collaborative Research: Support for Security and Safety of Programmable IoT Systems
NSF · $352k · 2017–2020
NSF · $136k · 2017–2020
SHF: Small: Revisiting Assumptions of Regression Testing
NSF · $462k · 2014–2019
NSF · $437k · 2018–2024
SHF: Small: IMUnit: Improved Multithreaded Unit Testing
NSF · $500k · 2009–2013
Frequent coauthors
- 50 shared
Sarfraz Khurshid
- 24 shared
Milos Gligoric
The University of Texas at Austin
- 23 shared
August Shi
The University of Texas at Austin
- 15 shared
Tao Xie
Peking University
- 14 shared
Lingming Zhang
- 13 shared
Owolabi Legunsen
Cornell University
- 12 shared
Vilas Jagannath
- 11 shared
Danny Dig
Labs
Darko MarinovPI
Awards & honors
- Test of Time award at FSE 2024
- Undergraduate Research Mentorship Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Darko Marinov
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup