Sebastian Elbaum

· Professor of Computer ScienceVerified

University of Virginia · Computer Science

Active 1999–2026

h-index44

Citations9.0k

Papers22656 last 5y

Funding$4.3M1 active

Faculty page

See your match with Sebastian Elbaum — sign in to PhdFit.Sign in

About

Sebastian Elbaum is a Professor in the Department of Computer Science at the University of Virginia. He co-leads the Lab for Engineering Safe Software (LESS Lab) and focuses on building safe autonomous systems. His research interests include Software Engineering and Autonomous Systems. He has received numerous awards including an NSF Career Award, an IBM Innovation Award, a Google Faculty Research Award, and is recognized as an ACM Fellow and an IEEE Fellow. Additionally, he serves as an Adjunct Senior Fellow for Emerging Computing Technologies at the Council on Foreign Relations, where he connects autonomous systems and AI with policy in national security.

Research topics

Computer Science
Artificial Intelligence
Computer Security
Software engineering
Programming language
Distributed computing

Selected publications

STADA: Specification-based Testing for Autonomous Driving Agents
arXiv (Cornell University) · 2026-03-11
preprintOpen access
Simulation-based testing has become a standard approach to validating autonomous driving agents prior to real-world deployment. A high-quality validation campaign will exercise an agent in diverse contexts comprised of varying static environments, e.g., lanes, intersections, signage, and dynamic elements, e.g., vehicles and pedestrians. To achieve this, existing test generation techniques rely on template-based, manually constructed, or random scenario generation. When applied to validate formally specified safety requirements, such methods either require significant human effort or run the risk of missing important behavior related to the requirement. To address this gap, we present STADA, a Specification-based Test generation framework for Autonomous Driving Agents that systematically generates the space of scenarios defined by a formal specification expressed in temporal logic (LTLf). Given a specification, STADA constructs all distinct initial scenes, a diverse space of continuations of those scenes, and simulations that reflect the behaviors of the specification. Evaluation of STADA on a variety of LTLf specifications formalized in SCENEFLOW using three complementary coverage criteria demonstrates that STADA yields more than 2x higher coverage than the best baseline on the finest criteria and a 75% increase for the coarsest criteria. Moreover, it matches the coverage of the best baseline with 6 times fewer simulations. While set in the context of autonomous driving, the approach is applicable to other domains with rich simulation environments.
Publisher DOI
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
ArXiv.org · 2026-04-23
articleOpen accessSenior author
Validating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.
Publisher OA PDF
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
arXiv (Cornell University) · 2026-04-23
preprintOpen accessSenior author
Validating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.
Publisher DOI
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-05
otherOpen accessSenior author
Full Changelog: https://github.com/NahianSalsabil/TRACE-benchmark-and-pipeline/commits/v1.0-fse-tool-demo
Publisher DOI
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-05
otherOpen accessSenior author
Full Changelog: https://github.com/NahianSalsabil/TRACE-benchmark-and-pipeline/commits/v1.0-fse-tool-demo
Publisher DOI
STADA: Specification-based Testing for Autonomous Driving Agents
ArXiv.org · 2026-03-11
articleOpen access
Simulation-based testing has become a standard approach to validating autonomous driving agents prior to real-world deployment. A high-quality validation campaign will exercise an agent in diverse contexts comprised of varying static environments, e.g., lanes, intersections, signage, and dynamic elements, e.g., vehicles and pedestrians. To achieve this, existing test generation techniques rely on template-based, manually constructed, or random scenario generation. When applied to validate formally specified safety requirements, such methods either require significant human effort or run the risk of missing important behavior related to the requirement. To address this gap, we present STADA, a Specification-based Test generation framework for Autonomous Driving Agents that systematically generates the space of scenarios defined by a formal specification expressed in temporal logic (LTLf). Given a specification, STADA constructs all distinct initial scenes, a diverse space of continuations of those scenes, and simulations that reflect the behaviors of the specification. Evaluation of STADA on a variety of LTLf specifications formalized in SCENEFLOW using three complementary coverage criteria demonstrates that STADA yields more than 2x higher coverage than the best baseline on the finest criteria and a 75% increase for the coarsest criteria. Moreover, it matches the coverage of the best baseline with 6 times fewer simulations. While set in the context of autonomous driving, the approach is applicable to other domains with rich simulation environments.
Publisher OA PDF
LabelAny3D: Label Any Object 3D in the Wild
ArXiv.org · 2026-01-04
articleOpen access
Detecting objects in 3D space from monocular input is crucial for applications ranging from robotics to scene understanding. Despite advanced performance in the indoor and autonomous driving domains, existing monocular 3D detection models struggle with in-the-wild images due to the lack of 3D in-the-wild datasets and the challenges of 3D annotation. We introduce LabelAny3D, an \emph{analysis-by-synthesis} framework that reconstructs holistic 3D scenes from 2D images to efficiently produce high-quality 3D bounding box annotations. Built on this pipeline, we present COCO3D, a new benchmark for open-vocabulary monocular 3D detection, derived from the MS-COCO dataset and covering a wide range of object categories absent from existing 3D datasets. Experiments show that annotations generated by LabelAny3D improve monocular 3D detection performance across multiple benchmarks, outperforming prior auto-labeling approaches in quality. These results demonstrate the promise of foundation-model-driven annotation for scaling up 3D recognition in realistic, open-world settings.
Publisher OA PDF
LabelAny3D: Label Any Object 3D in the Wild
arXiv (Cornell University) · 2026-01-04
preprintOpen access
Detecting objects in 3D space from monocular input is crucial for applications ranging from robotics to scene understanding. Despite advanced performance in the indoor and autonomous driving domains, existing monocular 3D detection models struggle with in-the-wild images due to the lack of 3D in-the-wild datasets and the challenges of 3D annotation. We introduce LabelAny3D, an \emph{analysis-by-synthesis} framework that reconstructs holistic 3D scenes from 2D images to efficiently produce high-quality 3D bounding box annotations. Built on this pipeline, we present COCO3D, a new benchmark for open-vocabulary monocular 3D detection, derived from the MS-COCO dataset and covering a wide range of object categories absent from existing 3D datasets. Experiments show that annotations generated by LabelAny3D improve monocular 3D detection performance across multiple benchmarks, outperforming prior auto-labeling approaches in quality. These results demonstrate the promise of foundation-model-driven annotation for scaling up 3D recognition in realistic, open-world settings.
Publisher DOI
T4PC: Training Deep Neural Networks for Property Conformance
IEEE Transactions on Software Engineering · 2025-08-25
article
The increasing integration of Deep Neural Networks (DNNs) into safety critical systems, such as Autonomous Vehicles (AVs), where failures can lead to significant consequences, has fostered the development of many Verification and Validation (V&V) techniques. However, these techniques are applied mainly after the DNN training process is complete. This delayed application of V&V techniques means that property violations found require restarting the expensive training process, and that V&V techniques struggle in pursuit of checking increasingly large and sophisticated DNNs. To address this issue, we propose T4PC, a framework to increase property conformance <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">during</i> DNN training. Increasing property conformance is achieved by enriching: 1) the data preparation phase to account for properties’ pre and postcondition satisfaction, and 2) the training phase to account for the property satisfaction by incorporating a new <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">property loss</i> term that is integrated with the main loss. Our family of controlled experiments targeting a navigation DNN show that T4PC can effectively train it for conformance to single and multiple properties, and can also fine-tune for conformance an existing navigation DNN originally trained for accuracy. Our case study in simulation applying T4PC to fine-tune two open source AV systems operating in the CARLA simulator shows that it can reduce targeted driving violations while retaining its original driving capabilities.
Publisher DOI
Closing the Gap Between Sensor Inputs and Driving Properties: A Scene Graph Generator for CARLA
2025-04-27 · 2 citations
article
The software engineering community has increasingly taken up the task of assuring safety in autonomous driving systems by applying software engineering principles to create techniques to develop, validate, and verify these systems. However, developing and analyzing these techniques requires extensive sensor datasets and execution infrastructure with the relevant features and known semantics for the task at hand. While the community has invested substantial effort in gathering and cultivating large-scale datasets and developing simulation infrastructure with varying features, semantic understanding of this data has remained out of reach, relying on limited, manually-crafted datasets or bespoke simulation environments to ensure the desired semantics are met. To address this, we developed a plugin for the widely-used autonomous driving simulator CARLA called CARLASGG, that extracts relevant ground-truth spatial and semantic information from the simulator state at runtime in the form of scene graphs, enabling online and post-hoc automatic reasoning about the semantics of the scenario and associated sensor data. The tool has been successfully deployed in multiple previous software engi-neering approach evaluations which we describe to demonstrate the utility of the tool. The tool enables the client to adjust the pre-cision of the semantic information captured in the scene graph to suit client application needs. We provide a detailed description of the tool's design, capabilities, and configurations, with additional documentation available accompanying the tool's online source: https://github.com/less-Iab-uva/carla_scene_graphs.
Publisher DOI

Recent grants

SHF:Small: Holistic Analysis: integrating the semantics of the world and the code
NSF · $416k · 2018–2021
SHF: Medium: More Reliable Image Networks through Scene-based Specification, Neuro-symbolic Training, and Systematic Specification-driven Testing
NSF · $1.2M · 2023–2027
SHF: Small: Solving the Search for Relevant Code in Large Repositories with Lightweight Specifications
NSF · $449k · 2012–2016
SHF: Small: T2T: A Framework for Amplifying Testing Resources
NSF · $492k · 2009–2013
NRI: INT: COLLAB: Raining Drones: Mid-Air Release & Recovery of Atmospheric Sensing Systems
NSF · $404k · 2019–2023

Frequent coauthors

Matthew B. Dwyer
55 shared
Gregg Rothermel
35 shared
Carrick Detweiler
28 shared
David Shriver
16 shared
David S. Rosenblum
George Mason University
15 shared
Kathryn T. Stolee
North Carolina State University
14 shared
John‐Paul Ore
North Carolina State University
11 shared
Trey Woodlief
University of Virginia
11 shared

Awards & honors

NSF Career Award
IBM Innovation Award
Google Faculty Research Award
FSE Test of Time Award
five ACM SigSoft Distinguished Paper Awards

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Sebastian Elbaum

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you