
Sebastian Elbaum
· Professor of Computer ScienceVerifiedUniversity of Virginia · Computer Science
Active 1999–2026
About
Sebastian Elbaum is a Professor in the Department of Computer Science at the University of Virginia. He co-leads the Lab for Engineering Safe Software (LESS Lab) and focuses on building safe autonomous systems. His research interests include Software Engineering and Autonomous Systems. He has received numerous awards including an NSF Career Award, an IBM Innovation Award, a Google Faculty Research Award, and is recognized as an ACM Fellow and an IEEE Fellow. Additionally, he serves as an Adjunct Senior Fellow for Emerging Computing Technologies at the Council on Foreign Relations, where he connects autonomous systems and AI with policy in national security.
Research topics
- Computer Science
- Artificial Intelligence
- Computer Security
- Software engineering
- Programming language
- Distributed computing
Selected publications
STADA: Specification-based Testing for Autonomous Driving Agents
arXiv (Cornell University) · 2026-03-11
preprintOpen accessSimulation-based testing has become a standard approach to validating autonomous driving agents prior to real-world deployment. A high-quality validation campaign will exercise an agent in diverse contexts comprised of varying static environments, e.g., lanes, intersections, signage, and dynamic elements, e.g., vehicles and pedestrians. To achieve this, existing test generation techniques rely on template-based, manually constructed, or random scenario generation. When applied to validate formally specified safety requirements, such methods either require significant human effort or run the risk of missing important behavior related to the requirement. To address this gap, we present STADA, a Specification-based Test generation framework for Autonomous Driving Agents that systematically generates the space of scenarios defined by a formal specification expressed in temporal logic (LTLf). Given a specification, STADA constructs all distinct initial scenes, a diverse space of continuations of those scenes, and simulations that reflect the behaviors of the specification. Evaluation of STADA on a variety of LTLf specifications formalized in SCENEFLOW using three complementary coverage criteria demonstrates that STADA yields more than 2x higher coverage than the best baseline on the finest criteria and a 75% increase for the coarsest criteria. Moreover, it matches the coverage of the best baseline with 6 times fewer simulations. While set in the context of autonomous driving, the approach is applicable to other domains with rich simulation environments.
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
ArXiv.org · 2026-04-23
articleOpen accessSenior authorValidating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
arXiv (Cornell University) · 2026-04-23
preprintOpen accessSenior authorValidating Autonomous Vehicles (AVs) requires exposure to rare, safety-critical scenarios, infrequent in routine driving data. Existing benchmarks address this by generating synthetic conflicts or mapping accident descriptions to abstract road geometries, failing to capture the topological complexity of real-world crashes. We introduce TRACE , a pipeline that automates the reconstruction of NHTSA crash reports into high-fidelity CARLA simulations by (1) retrieving site-specific OpenStreetMap data to preserve exact road topology, (2) leveraging Large Language Models to infer vehicles' initial state from road geometry and pre-crash maneuvers, and (3) generating simulation trajectories from semi-structured report data. Using this pipeline, we curated a benchmark of 52 diverse accident scenarios covering varied collision types, road topologies, and pre-crash maneuvers, providing a challenging open source resource for testing AV systems against real-world failures.
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-05
otherOpen accessSenior authorFull Changelog: https://github.com/NahianSalsabil/TRACE-benchmark-and-pipeline/commits/v1.0-fse-tool-demo
TRACE: Topology-aware Reconstruction of Accidents in CARLA for AV Evaluation
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-05
otherOpen accessSenior authorFull Changelog: https://github.com/NahianSalsabil/TRACE-benchmark-and-pipeline/commits/v1.0-fse-tool-demo
STADA: Specification-based Testing for Autonomous Driving Agents
ArXiv.org · 2026-03-11
articleOpen accessSimulation-based testing has become a standard approach to validating autonomous driving agents prior to real-world deployment. A high-quality validation campaign will exercise an agent in diverse contexts comprised of varying static environments, e.g., lanes, intersections, signage, and dynamic elements, e.g., vehicles and pedestrians. To achieve this, existing test generation techniques rely on template-based, manually constructed, or random scenario generation. When applied to validate formally specified safety requirements, such methods either require significant human effort or run the risk of missing important behavior related to the requirement. To address this gap, we present STADA, a Specification-based Test generation framework for Autonomous Driving Agents that systematically generates the space of scenarios defined by a formal specification expressed in temporal logic (LTLf). Given a specification, STADA constructs all distinct initial scenes, a diverse space of continuations of those scenes, and simulations that reflect the behaviors of the specification. Evaluation of STADA on a variety of LTLf specifications formalized in SCENEFLOW using three complementary coverage criteria demonstrates that STADA yields more than 2x higher coverage than the best baseline on the finest criteria and a 75% increase for the coarsest criteria. Moreover, it matches the coverage of the best baseline with 6 times fewer simulations. While set in the context of autonomous driving, the approach is applicable to other domains with rich simulation environments.
LabelAny3D: Label Any Object 3D in the Wild
ArXiv.org · 2026-01-04
articleOpen accessDetecting objects in 3D space from monocular input is crucial for applications ranging from robotics to scene understanding. Despite advanced performance in the indoor and autonomous driving domains, existing monocular 3D detection models struggle with in-the-wild images due to the lack of 3D in-the-wild datasets and the challenges of 3D annotation. We introduce LabelAny3D, an \emph{analysis-by-synthesis} framework that reconstructs holistic 3D scenes from 2D images to efficiently produce high-quality 3D bounding box annotations. Built on this pipeline, we present COCO3D, a new benchmark for open-vocabulary monocular 3D detection, derived from the MS-COCO dataset and covering a wide range of object categories absent from existing 3D datasets. Experiments show that annotations generated by LabelAny3D improve monocular 3D detection performance across multiple benchmarks, outperforming prior auto-labeling approaches in quality. These results demonstrate the promise of foundation-model-driven annotation for scaling up 3D recognition in realistic, open-world settings.
LabelAny3D: Label Any Object 3D in the Wild
arXiv (Cornell University) · 2026-01-04
preprintOpen accessDetecting objects in 3D space from monocular input is crucial for applications ranging from robotics to scene understanding. Despite advanced performance in the indoor and autonomous driving domains, existing monocular 3D detection models struggle with in-the-wild images due to the lack of 3D in-the-wild datasets and the challenges of 3D annotation. We introduce LabelAny3D, an \emph{analysis-by-synthesis} framework that reconstructs holistic 3D scenes from 2D images to efficiently produce high-quality 3D bounding box annotations. Built on this pipeline, we present COCO3D, a new benchmark for open-vocabulary monocular 3D detection, derived from the MS-COCO dataset and covering a wide range of object categories absent from existing 3D datasets. Experiments show that annotations generated by LabelAny3D improve monocular 3D detection performance across multiple benchmarks, outperforming prior auto-labeling approaches in quality. These results demonstrate the promise of foundation-model-driven annotation for scaling up 3D recognition in realistic, open-world settings.
T4PC: Training Deep Neural Networks for Property Conformance
IEEE Transactions on Software Engineering · 2025-08-25
articleThe increasing integration of Deep Neural Networks (DNNs) into safety critical systems, such as Autonomous Vehicles (AVs), where failures can lead to significant consequences, has fostered the development of many Verification and Validation (V&V) techniques. However, these techniques are applied mainly after the DNN training process is complete. This delayed application of V&V techniques means that property violations found require restarting the expensive training process, and that V&V techniques struggle in pursuit of checking increasingly large and sophisticated DNNs. To address this issue, we propose T4PC, a framework to increase property conformance <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">during</i> DNN training. Increasing property conformance is achieved by enriching: 1) the data preparation phase to account for properties’ pre and postcondition satisfaction, and 2) the training phase to account for the property satisfaction by incorporating a new <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">property loss</i> term that is integrated with the main loss. Our family of controlled experiments targeting a navigation DNN show that T4PC can effectively train it for conformance to single and multiple properties, and can also fine-tune for conformance an existing navigation DNN originally trained for accuracy. Our case study in simulation applying T4PC to fine-tune two open source AV systems operating in the CARLA simulator shows that it can reduce targeted driving violations while retaining its original driving capabilities.
Closing the Gap Between Sensor Inputs and Driving Properties: A Scene Graph Generator for CARLA
2025-04-27 · 2 citations
articleThe software engineering community has increasingly taken up the task of assuring safety in autonomous driving systems by applying software engineering principles to create techniques to develop, validate, and verify these systems. However, developing and analyzing these techniques requires extensive sensor datasets and execution infrastructure with the relevant features and known semantics for the task at hand. While the community has invested substantial effort in gathering and cultivating large-scale datasets and developing simulation infrastructure with varying features, semantic understanding of this data has remained out of reach, relying on limited, manually-crafted datasets or bespoke simulation environments to ensure the desired semantics are met. To address this, we developed a plugin for the widely-used autonomous driving simulator CARLA called CARLASGG, that extracts relevant ground-truth spatial and semantic information from the simulator state at runtime in the form of scene graphs, enabling online and post-hoc automatic reasoning about the semantics of the scenario and associated sensor data. The tool has been successfully deployed in multiple previous software engi-neering approach evaluations which we describe to demonstrate the utility of the tool. The tool enables the client to adjust the pre-cision of the semantic information captured in the scene graph to suit client application needs. We provide a detailed description of the tool's design, capabilities, and configurations, with additional documentation available accompanying the tool's online source: https://github.com/less-Iab-uva/carla_scene_graphs.
Recent grants
SHF:Small: Holistic Analysis: integrating the semantics of the world and the code
NSF · $416k · 2018–2021
NSF · $1.2M · 2023–2027
NSF · $449k · 2012–2016
SHF: Small: T2T: A Framework for Amplifying Testing Resources
NSF · $492k · 2009–2013
NRI: INT: COLLAB: Raining Drones: Mid-Air Release & Recovery of Atmospheric Sensing Systems
NSF · $404k · 2019–2023
Frequent coauthors
- 55 shared
Matthew B. Dwyer
- 35 shared
Gregg Rothermel
- 28 shared
Carrick Detweiler
- 16 shared
David Shriver
- 15 shared
David S. Rosenblum
George Mason University
- 14 shared
Kathryn T. Stolee
North Carolina State University
- 11 shared
John‐Paul Ore
North Carolina State University
- 11 shared
Trey Woodlief
University of Virginia
Awards & honors
- NSF Career Award
- IBM Innovation Award
- Google Faculty Research Award
- FSE Test of Time Award
- five ACM SigSoft Distinguished Paper Awards
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Sebastian Elbaum
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup