Jack W. Davidson

· Professor, Computer Science Director, Cyber Defense Program of Study

University of Virginia · Computer Science

Active 1980–2025

h-index37

Citations5.1k

Papers23020 last 5y

Funding$3.6M

Faculty page Lab page

OpenAlex

See your match with Jack W. Davidson — sign in to PhdFit.Sign in

About

Jack W. Davidson is a Professor of Computer Science in the School of Engineering and Applied Science at the University of Virginia. He joined the faculty in 1981 after earning his Ph.D. in Computer Science from the University of Arizona. His research interests include compilers, computer security, programming languages, computer architecture, and embedded systems. Professor Davidson is the principal investigator on several ongoing grants focused on developing comprehensive methods for protecting software from malicious attacks. He is recognized as a Fellow of the ACM and a Life Fellow of the IEEE. His service to the professional community includes roles such as Associate Editor for ACM’s Transactions on Programming Languages and Systems and ACM’s Transactions on Architecture and Compiler Optimizations. He served as Chair of ACM’s Special Interest Group on Programming Languages (SIGPLAN) from 2005 to 2007 and currently serves on the ACM Executive Council, chairing the ACM Digital Library Board. Professor Davidson is also a co-author of two best-selling introductory programming textbooks and has received awards such as the IEEE Taylor L. Booth Award for his efforts in transforming computer science education.

Research topics

Computer Science
Data Mining
Operating system
Programming language
Machine Learning
Artificial Intelligence
Computer Security
World Wide Web
Mathematics
Engineering
Computer network
Arithmetic

Selected publications

PHASE: Passive Human Activity Simulation Evaluation
ArXiv.org · 2025-07-17
preprintOpen accessSenior author
Cybersecurity simulation environments, such as cyber ranges, honeypots, and sandboxes, require realistic human behavior to be effective, yet no quantitative method exists to assess the behavioral fidelity of synthetic user personas. This paper presents PHASE (Passive Human Activity Simulation Evaluation), a machine learning framework that analyzes Zeek connection logs and distinguishes human from non-human activity with over 90\% accuracy. PHASE operates entirely passively, relying on standard network monitoring without any user-side instrumentation or visible signs of surveillance. All network activity used for machine learning is collected via a Zeek network appliance to avoid introducing unnecessary network traffic or artifacts that could disrupt the fidelity of the simulation environment. The paper also proposes a novel labeling approach that utilizes local DNS records to classify network traffic, thereby enabling machine learning analysis. Furthermore, we apply SHAP (SHapley Additive exPlanations) analysis to uncover temporal and behavioral signatures indicative of genuine human users. In a case study, we evaluate a synthetic user persona and identify distinct non-human patterns that undermine behavioral realism. Based on these insights, we develop a revised behavioral configuration that significantly improves the human-likeness of synthetic activity yielding a more realistic and effective synthetic user persona.
Publisher OA PDF DOI
CELEST: Federated Learning for Globally Coordinated Threat Detection
IEEE Transactions on Information Forensics and Security · 2025-01-01 · 3 citations
articleSenior author
The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection), a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We also design a poisoning detection and mitigation method, DTrust, for federated learning in the collaborative threat detection domain. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of 42 previously unknown malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.
Publisher DOI
Helix++: A platform for efficiently securing software
arXiv (Cornell University) · 2023-04-10
preprintOpen access1st authorCorresponding
The open-source Helix++ project improves the security posture of computing platforms by applying cutting-edge cybersecurity techniques to diversify and harden software automatically. A distinguishing feature of Helix++ is that it does not require source code or build artifacts; it operates directly on software in binary form--even stripped executables and libraries. This feature is key as rebuilding applications from source is a time-consuming and often frustrating process. Diversification breaks the software monoculture and makes attacks harder to execute as information needed for a successful attack will have changed unpredictably. Diversification also forces attackers to customize an attack for each target instead of attackers crafting an exploit that works reliably on all similarly configured targets. Hardening directly targets key attack classes. The combination of diversity and hardening provides defense-in-depth, as well as a moving target defense, to secure the Nation's cyber infrastructure.
Publisher OA PDF DOI
Zipr: A High-Impact, Robust, Open-source, Multi-platform, Static Binary Rewriter
arXiv (Cornell University) · 2023-12-01
preprintOpen accessSenior author
Zipr is a tool for static binary rewriting, first published in 2016. Zipr was engineered to support arbitrary program modification with an emphasis on low overhead, robustness, and flexibility to perform security enhancements and instrumentation. Originally targeted to Linux x86-32 binaries, Zipr now supports 32- and 64-bit binaries for X86, ARM, and MIPS architectures, as well as preliminary support for Windows programs. These features have helped Zipr make a dramatic impact on research. It was first used in the DARPA Cyber Grand Challenge to take second place overall, with the best security score of any participant, Zipr has now been used in a variety of research areas by both the original authors as well as third parties. Zipr has also led to publications in artificial diversity, program instrumentation, program repair, fuzzing, autonomous vehicle security, research computing security, as well as directly contributing to two student dissertations. The open-source repository has accepted accepted patches from several external authors, demonstrating the impact of Zipr beyond the original authors.
Publisher OA PDF DOI
Sentinel: A Multi-institution Enterprise Scale Platform for Data-driven Cybersecurity Research
2022-10-01 · 1 citations
articleSenior author
Current cybersecurity research is constrained by the general scarcity of large, realistic, labeled network traffic datasets. To address said scarcity, this paper introduces Sentinel: a multi-enterprise scientific instrument developed to support data-driven cybersecurity research. Sentinel provides researchers access to virtual computing infrastructure and petabytes of data collected over several years from network sensors at two large, disjoint educational institutions - the University of Virginia and Virginia Tech. The network dataset is supplemented by multi-modal malware activity logs generated by attack recreation exercises which realistically integrate ground truth into collected edge sensor data. To mitigate risks associated with providing access to enterprise network sensor logs, Sentinel uses a combination of a code-to-data policy, data usage agreements, and pattern-preserving anonymization. Sentinel has been used as part of a government-funded effort to investigate new machine learning algorithms, cybersecurity forensics, and data retention techniques.
Publisher DOI
High-performance reliable network-multicast over a trial deployment
Cluster Computing · 2022-02-04 · 1 citations
articleOpen accessSenior author
A continuing trend in many scientific disciplines is the growth in the volume of data collected by scientific instruments and the desire to rapidly and efficiently distribute this data to the scientific community. As both the data volume and number of subscribers grows, a reliable network multicast is a promising approach to alleviate the demand for the bandwidth needed to support efficient data distribution to multiple, geographically-distributed, research communities. In prior work, we identified the need for a reliable network multicast: scientists engaged in atmospheric research subscribing to meteorological file-streams. An application called Local Data Manager (LDM) is used to disseminate meteorological data to hundreds of subscribers. This paper presents a high-performance, reliable network multicast solution, Dynamic Reliable File-Stream Multicast Service (DRFSM), and describes a trial deployment comprising eight university campuses connected via Research-and-Education Networks (RENs) and Internet2 and a DRFSM-enabled LDM (LDM7). Using this deployment, we evaluated the DRFSM architecture, which uses network multicast with a reliable transport protocol, and leverages Layer-2 (L2) multipoint Virtual LAN (VLAN/MPLS). A performance monitoring system was developed to collect the real-time performance of LDM7. The measurements showed that our proof-of-concept prototype worked significantly better than the current production LDM (LDM6) in two ways. First, LDM7 distributes data faster than LDM6. With six subscribers and a 100 Mbps bandwidth limit setting, an almost 22-fold improvement in delivery time was observed with LDM7. Second, LDM7 significantly reduces the bandwidth requirement needed to deliver data to subscribers. LDM7 needed 90% less bandwidth than LDM6 to achieve a 20 Mbps average throughput across four subscribers.
Publisher OA PDF DOI
START: A Framework for Trusted and Resilient Autonomous Vehicles (Practical Experience Report)
2022-10-01 · 5 citations
article
From delivering groceries and vital medical supplies to driving trucks and passenger vehicles, society is becoming increasingly reliant on autonomous vehicles (AVs), It is therefore vital that these systems be resilient to adversarial actions, perform mission-critical functions despite known and unknown vulnerabilities, and protect and repair themselves during or after operational failures and cyber-attacks. While techniques have been proposed to address individual aspects of software resilience, vulnerability assessment, automated repair, and invariant detection, there is no approach that provides end-to-end trusted and resilient mission operation and repair on AVs. In this paper, we describe our experience of building START, <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> Software Techniques for Automated Resilience and Trust a framework that provides increased resilience, accurate vul-nerability assessment, and trustworthy post-repair operation in autonomous vehicles. We combine techniques from binary analysis and rewriting, runtime monitoring and verification, auto-mated program repair, and invariant detection that cooperatively detect and eliminate a swath of software security vulnerabilities in cyberphysical systems. We evaluate our framework using an autonomous vehicle simulation platform, demonstrating its holistic applicability to AVs.
Publisher DOI
CELEST: Federated Learning for Globally Coordinated Threat Detection
arXiv (Cornell University) · 2022-05-23 · 6 citations
preprintOpen accessSenior author
The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection, a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We also design a poisoning detection and mitigation method, DTrust, specifically designed for federated learning in the collaborative threat detection domain. DTrust successfully detects poisoning clients using the feedback from participating clients to investigate and remove them from the training process. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.
Publisher OA PDF DOI
PORTFILER: Port-Level Network Profiling for Self-Propagating Malware Detection
2021 · 10 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Recent self-propagating malware (SPM) campaigns compromised hundred of thousands of victim machines on the Internet. It is challenging to detect these attacks in their early stages, as adversaries utilize common network services, use novel techniques, and can evade existing detection mechanisms. We propose PorTFILER (PORT-Level Network Traffic ProFILER), a new machine learning system applied to network traffic for detecting SPM attacks. PORTFILER extracts port-level features from the Zeek connection logs collected at a border of a monitored network, applies anomaly detection techniques to identify suspicious events, and ranks the alerts across ports for investigation by the Security Operations Center (SOC). We propose a novel ensemble methodology for aggregating individual models in PORTFILER that increases resilience against several evasion strategies compared to standard ML baselines. We extensively evaluate PorTFILER on traffic collected from two university networks, and show that it can detect SPM attacks with different patterns, such as WannaCry and Mirai, and performs well under evasion. Ranking across ports achieves precision over 0.94 and false positive rates below $8 \times 10^{-4}$ in the top 100 highly ranked alerts. When deployed on the university networks, PorTFILER detected anomalous SPM-like activity on one of the campus networks, confirmed by the university SOC as malicious. PortFILER also detected a Mirai attack recreated on the two university networks with higher precision and recall than deep-learning based autoencoder methods.
Publisher DOI
BigMap: Future-proofing Fuzzers with Efficient Large Maps
2021-06-01 · 5 citations
article
Coverage-guided fuzzing is a powerful technique for finding security vulnerabilities and latent bugs in software. Such fuzzers usually store the coverage information in a small bitmap. Hash collision within this bitmap is a well-known issue and can reduce fuzzers' ability to discover potential bugs. Prior works noted that collision mitigation with naïvely enlarging the hash space leads to an unacceptable runtime overhead. This paper describes BigMap, a two-level hashing scheme that enables using an arbitrarily large coverage_bitmap with low overhead. The key observation is that the overhead stems from frequent operations performed on the full bitmap, although only a fraction of the map is actively used. BigMap condenses these scattered active regions on a second bitmap and limits the operations only on that condensed area. We implemented our approach on top of the popular fuzzer AFL and conducted experiments on 19 benchmarks from FuzzBench and OSS-Fuzz. The results indicate that BigMap does not suffer from increased runtime overhead even with large map sizes. Compared to AFL, BigMap achieved an average of 4.5x higher test case generation throughput for a 2MB map and 33.1x for an 8MB map. The throughput gain for the 2MB map increased further to 9.2x with parallel fuzzing sessions, indicating superior scalability of BigMap. More importantly, BigMap's compatibility with most coverage metrics, along with its efficiency on bigger maps, enabled exploring aggressive compositions of expensive coverage metrics and fuzzing algorithms, uncovering 33% more unique crashes. BigMap makes using large bitmaps practical and enables researchers to explore a wider design space of coverage metrics
Publisher DOI

Recent grants

Collaborative Research: CRI: A Community Resource Development Project for a Retargetable and Reconfigurable Software Dynamic Translation Infrastructure
NSF · $107k · 2006–2009
CC* Integration: Enhancement and deployment of LDM7 for scientific data distribution
NSF · $1.0M · 2017–2021
NGS: Collaborative Research: Adapting Program Code Continuously and Aggressively
NSF · $532k · 2003–2007
Experimental Partnership - Comprehensive Retargetable Embedded Systems Software Development Environment
NSF · $2.0M · 2000–2006

Frequent coauthors

Jason D. Hiser
70 shared
Anh Nguyen‐Tuong
31 shared
Bruce R. Childers
University of Pittsburgh
31 shared
David Whalley
Royal North Shore Hospital
28 shared
Sang Lyul
Karlsruhe University of Education
25 shared
Jan Van Leeuwen
Utrecht University
25 shared
Gerhard Goos
25 shared
Mary Lou Soffa
University of Virginia
19 shared

Education

Ph.D., Computer Science
University of Arizona

Awards & honors

IEEE Life Fellow 2020
ACM Distinguished Service Award 2010
ACM Fellow 2008
IEEE Computer Society Taylor L. Booth Award 2008
ACM Undergraduate Teaching Award, School of Engineering and…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jack W. Davidson

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you