Rene Just

· ProfessorVerified

University of Washington · Computer Science & Engineering

Active 1984–2026

h-index26

Citations3.9k

Papers9651 last 5y

Funding$1.1M1 active

Faculty page Lab page

See your match with Rene Just — sign in to PhdFit.Sign in

About

René Just is an Associate Professor at the University of Washington in the Paul G. Allen School of Computer Science & Engineering. His research interests encompass software engineering, software security, and data science, with particular focus on static and dynamic program analysis, developer productivity, and applied statistics and machine learning. He has developed research and educational infrastructures that are widely adopted by other researchers and instructors, such as Defects4J and the Major mutation framework. René Just is recognized for his contributions through awards including an NSF CAREER Award and two Most Influential Paper (10-year test-of-time) Awards. His work has also received multiple distinguished paper awards and honorable mentions.

Research topics

Computer Science
Machine Learning
Programming language
Data Mining
Artificial Intelligence
Software engineering
Data science
Genetics
Mathematics
Biology
Theoretical computer science
Statistics
Engineering
Reliability engineering
Database

Selected publications

Regulating AI: Where U.S. State Policy and HCI (Mis)align
2026-04-13 · 1 citations
articleOpen access
Artificial intelligence (AI) technologies are increasingly adopted into everyday life, with most investment and development concentrated in the U.S. In response to rapid AI integration and scant federal guidelines, U.S. states have formed AI committees charged with studying AI-related societal trade-offs. We analyzed the 18 existing state-level AI committee reports to understand how policymakers discuss AI-related benefits and risks. We then compared the risks surfaced by policymakers to an established taxonomy of AI risks aggregated from literature and examined how policymakers’ concerns align—or misalign—from those of HCI scholars. These insights provide important mileposts for shaping currently ongoing policy initiatives and future research. Our findings reveal important gaps: while committees invoke responsible AI, their framings often omit broader socio-technical concerns emphasized in HCI. We discuss opportunities for HCI to support socio-technical perspectives, employ participatory design, and close the gap between research and policy.
Publisher DOI
Solver-Aided Verification of Policy Compliance in Tool-Augmented LLM Agents
ArXiv.org · 2026-03-20
articleOpen accessSenior author
Tool-augmented Large Language Models (TaLLMs) extend LLMs with the ability to invoke external tools, enabling them to interact with real-world environments. However, a major limitation in deploying TaLLMs in sensitive applications such as customer service and business process automation is a lack of reliable compliance with domain-specific operational policies regarding tool-use and agent behavior. Current approaches merely steer LLMs to adhere to policies by including policy descriptions in the LLM context, but these provide no guarantees that policy violations will be prevented. In this paper, we introduce an SMT solver-aided framework to enforce tool-use policy compliance in TaLLM agents. Specifically, we use an LLM-assisted, human-guided approach to translate natural-language-specified tool-use policies into formal logic (SMT-LIB-2.0) constraints over agent-observable state and tool arguments. At runtime, planned tool calls are intercepted and checked against the constraints using the Z3 solver as a pre-condition to the tool call. Tool invocations that violate the policy are blocked. We evaluated on the TauBench benchmark and demonstrate that solver-aided policy checking reduces policy violations while maintaining overall task accuracy. These results suggest that integrating formal reasoning into TaLLM execution can improve tool-call policy compliance and overall reliability.
Publisher OA PDF
What Types of Automated Tests do Developers Write?
2025-04-28
articleSenior author
Software testing is a widely adopted quality assurance technique that assesses whether a software system meets a given specification. The overall goal of software testing is to develop effective tests that capture desired program behaviors and reveal defects. Automated software testing is an essential part of modern software development processes, in particular those that focus on continuous integration and deployment. Existing test classifications (e.g., unit vs. integration vs. system tests) and testing best practices offer general conceptual frameworks, but instantiating these conceptual models requires a definition of what is considered a unit, or even a test. These conceptual models are rarely explicated in the literature or documentation which makes interpretation and generalization of results (e.g., comparisons between unit and integration testing efficacy) difficult. Additionally, comparatively little is known about how developers operationalize software testing in modern industrial contexts, how they write and automate software tests, and how well those tests fit into existing classifications. Since software engineering processes have substantially evolved, it is time to revisit and refine test classifications to support future research on software testing efficacy and best practices. This is especially important with the advent of AI-generated test code, where those classifications may be used to automatically classify the types of generated tests or to formulate the desired test output.This paper presents a novel test classification framework, developed using insights and data on what types of tests developers write in practice. The data was collected in an industrial setting at Google and involves tens of thousands of developers and tens of millions of tests. The developed classification framework is precise enough that it can be encoded in an automated analysis. We describe our proof-of-concept implementation and report on the development approach and costs. We also report on the results of applying the automated classification to all tests in Google’s repository and on what types of automated tests developers write.
Publisher DOI
Evaluating the Impact of Scaffolding and Visualizations for Mutation Testing Exercises in Software Engineering Education
2025-06-23 · 1 citations
articleOpen accessSenior author
Mutation testing is an effective testing technique for improving how well a test suite can detect small changes to a program under test. This testing technique is seeing increased industry adoption. This paper aims to study the use of mutation testing in an educational setting and understand students' technical and conceptual challenges in applying mutation testing concepts. We report on two case studies of incorporating mutation testing into software engineering curricula.
Publisher DOI
Wildfire and Forest Management: Opportunities for HCI Research
ACM Transactions on Computer-Human Interaction · 2025-09-02 · 2 citations
article
Wildfire and forest management increasingly rely on geospatial technologies, i.e., data and tools contributing to the geographic mapping and analysis of the Earth, to inform measures for the control of wildfires. Nevertheless, challenges arising from domain experts adopting these complex, non-intuitive technologies are not well understood. We interviewed 12 participants in wildfire and forest management to explore the technical and socio-technical nature of these challenges, revealing that (1) knowledge and data are fragmented across stakeholders, ranging from governmental agencies to small landowners. This fragmentation causes participants to (2) struggle in sharing knowledge and expertise. Participants (3) voice concerns about model bias since decisions informed by geospatial technologies can have far-reaching impacts. Yet, they (4) face barriers engaging people most impacted by these decisions. We detail an HCI research agenda that includes: exploring opportunities to connect stakeholders and sharing knowledge, standardizing decision-making, and engaging local communities.
Publisher DOI
Resolving Conditional Implicit Calls to Improve Static and Dynamic Analysis in Android Apps
ACM Transactions on Software Engineering and Methodology · 2025-04-17
articleOpen access
An implicit call is a mechanism that triggers the execution of a method m without a direct call to m in the code being analyzed. For instance, in Android apps the Thread.start() method implicitly executes the Thread.run() method. These implicit calls can be conditionally triggered by programmer-specified constraints that are evaluated at runtime. For instance, the JobScheduler.schedule() method can be called to implicitly execute the JobService.onStartJob() method only if the device’s battery is charging. Such conditional implicit calls can effectively disguise logic bombs , posing significant challenges for both static and dynamic software analyses. Conservative static analysis may produce false-positive alerts due to over-approximation, while less conservative approaches might overlook potential covert behaviors, a serious concern in security analysis. Dynamic analysis may fail to generate the specific inputs required to activate these implicit call targets. To address these challenges, we introduce Archer, a tool designed to resolve conditional implicit calls and extract the constraints triggering execution control transfer. Our evaluation reveals that ① implicit calls are prevalent in Android apps; ② Archer enhances app models’ soundness beyond existing static analysis methods; and ③ Archer successfully infers constraint values, enabling dynamic analyzers to detect (i.e., thanks to better code coverage) and assess conditionally triggered implicit calls.
Publisher OA PDF DOI
Morello: Compiling Fast Neural Networks with Dynamic Programming and Spatial Compression
ArXiv.org · 2025-05-03
preprintOpen access
High-throughput neural network inference requires coordinating many optimization decisions, including parallel tiling, microkernel selection, and data layout. The product of these decisions forms a search space of programs which is typically intractably large. Existing approaches (e.g., auto-schedulers) often address this problem by sampling this space heuristically. In contrast, we introduce a dynamic-programming-based approach to explore more of the search space by iteratively decomposing large program specifications into smaller specifications reachable from a set of rewrites, then composing a final program from each rewrite that minimizes an affine cost model. To reduce memory requirements, we employ a novel memoization table representation, which indexes specifications by coordinates in $Z_{\geq 0}$ and compresses identical, adjacent solutions. This approach can visit a much larger set of programs than prior work. To evaluate the approach, we developed Morello, a compiler which lowers specifications roughly equivalent to a few-node XLA computation graph to x86. Notably, we found that an affine cost model is sufficient to surface high-throughput programs. For example, Morello synthesized a collection of matrix multiplication benchmarks targeting a Zen 1 CPU, including a 1x2048x16384, bfloat16-to-float32 vector-matrix multiply, which was integrated into Google's gemma.cpp.
Publisher OA PDF DOI
A Taxonomy of Failures in Tool-Augmented LLMs
2025-04-28 · 2 citations
articleSenior author
Large language models (LLMs) can perform a variety of tasks given a user prompt that contains a description of the task. To enhance the performance of LLMs, recent research has focused on augmenting LLMs with external tools, such as Python APIs, REST APIs, and other deep learning models. Much of the research on tool-augmented LLMs (TaLLMs) has focused on improving their capabilities and accuracy. However, research on understanding and characterizing the kinds of failures that can occur in these systems is lacking. To address this gap, this paper proposes a taxonomy of failures in TaLLMs and their root causes, details an analysis of the failures that occur in two published TaLLMs (Gorilla and Chameleon), and provides recommendations for testing and repair of TaLLMs.
Publisher DOI
AI-Assisted Assessment of Coding Practices in Modern Code Review
2024-07-10 · 19 citations
preprintOpen accessSenior author
Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that code contributions adhere to best practices. While some of these best practices can be automatically verified, verifying others is commonly left to human reviewers. This paper reports on the development, deployment, and evaluation of AutoCommenter, a system backed by a large language model that automatically learns and enforces coding best practices. We implemented AutoCommenter for four programming languages (C++, Java, Python, and Go) and evaluated its performance and adoption in a large industrial setting. Our evaluation shows that an end-to-end system for learning and enforcing coding best practices is feasible and has a positive impact on the developer workflow. Additionally, this paper reports on the challenges associated with deploying such a system to tens of thousands of developers and the corresponding lessons learned.
Publisher DOI
rTisane: Externalizing conceptual models for data analysis prompts reconsideration of domain assumptions and facilitates statistical modeling
2024-05-11 · 2 citations
articleOpen accessSenior author
Statistical models should accurately reflect analysts’ domain knowledge about variables and their relationships. While recent tools let analysts express these assumptions and use them to produce a resulting statistical model, it remains unclear what analysts want to express and how externalization impacts statistical model quality. This paper addresses these gaps. We first conduct an exploratory study of analysts using a domain-specific language (DSL) to express conceptual models. We observe a preference for detailing how variables relate and a desire to allow, and then later resolve, ambiguity in their conceptual models. We leverage these findings to develop rTisane, a DSL for expressing conceptual models augmented with an interactive disambiguation process. In a controlled evaluation, we find that analysts reconsidered their assumptions, self-reported externalizing their assumptions accurately, and maintained analysis intent with rTisane. Additionally, rTisane enabled some analysts to author statistical models they were unable to specify manually. For others, rTisane resulted in models that better fit the data or enabled iterative improvement.
Publisher OA PDF DOI

Recent grants

CAREER: Toward Effective, Predictable, and Consistent Software Testing
NSF · $550k · 2020–2026
CRI: CI-EN: Collaborative Research: An Experimental Infrastructure and a Database of Real Faults to Foster Reproducibility in Software Engineering Research
NSF · $582k · 2018–2021

Frequent coauthors

Gordon Fraser
University of Passau
37 shared
Michael D. Ernst
Seattle University
23 shared
Marko Ivanković
Google (Switzerland)
21 shared
Andrea Arcuri
OsloMet – Oslo Metropolitan University
17 shared
Goran Petrović
Google (Switzerland)
15 shared
Małgorzata Salawa
Google (United States)
11 shared
Manushree Vijayvergiya
Google (Switzerland)
11 shared
Benjamin Kushigian
Seattle University
9 shared

Labs

Rene Just LabPI
Not provided

Awards & honors

NSF CAREER Award
Two Most Influential Paper (10-year test-of-time) Awards

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Rene Just

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you