Adam Bates

· Associate ProfessorVerified

University of Illinois Urbana-Champaign · Computer Science

Active 2008–2025

h-index28

Citations2.5k

Papers8843 last 5y

Funding$1.9M1 active

Faculty page Lab page

See your match with Adam Bates — sign in to PhdFit.Sign in

About

Adam Bates is an Associate Professor at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign. He holds a Ph.D. in Computer Science from the University of Florida, earned in 2016, with a thesis focused on designing and leveraging trustworthy provenance-aware systems. His academic background also includes a Master of Science in Computer Science from the University of Oregon and a Bachelor of Science in Computer Science from the University of Maryland. His research interests encompass intrusion detection systems, digital privacy in everyday user technologies, threat detection, investigation, and response. His work primarily addresses operating systems security and privacy, contributing to the development of scalable, efficient, and secure alert triage systems for endpoint detection and response, as well as exploring privacy concerns in fitness tracking and smart home devices. Dr. Bates has authored numerous articles in conference proceedings, emphasizing his active engagement in advancing security and privacy research within the computing community.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Data Mining
Physics
Theoretical computer science
Materials science
Algorithm
Geology
Nanotechnology

Selected publications

What We Talk About When We Talk About Logs: Understanding the Effects of Dataset Quality on Endpoint Threat Detection Research
2025-05-12 · 4 citations
articleSenior author
Endpoint threat detection research hinges on the availability of worthwhile evaluation benchmarks, but experimenters' understanding of the contents of benchmark datasets is often limited. Typically, attention is only paid to the realism of attack behaviors, which comprises only a small percentage of the audit logs in the dataset, while other characteristics of the data are inscrutable and unknown. We propose a new set of questions for what to talk about when we talk about logs (i.e., datasets): What activities are in the dataset? We introduce a novel visualization that succinctly represents the totality of 100+ GB datasets by plotting the occurrence of provenance graph neighborhoods in a time series. How synthetic is the background activity? We perform autocorrelation analysis of provenance neighborhoods in the training split to identify process behaviors that occur at predictable intervals in the test split. Finally, How conspicuous is the malicious activity? We quantify the proportion of attack behaviors that are observed as benign neighborhoods in the training split as compared to previously-unseen attack neighborhoods. We then validate these questions by profiling the classification performance of state-of-the-art intrusion detection systems (R-CAID, FLASH, KAIROS, GNN) against a battery of public benchmark datasets (DARPA Transparent Computing and OpTC, ATLAS, ATLASv2). We demonstrate that synthetic background activities dramatically inflate True Negative Rates, while conspicuous malicious activities artificially boost True Positive Rates. Further, by explicitly controlling for these factors, we provide a more holistic picture of classifier performance. This work will elevate the dialogue surrounding threat detection datasets and will increase the rigor of threat detection experiments.
Publisher DOI
Social context prevents heat hormetic effects against mutagens during fish development
FEBS Letters · 2025-04-23 · 2 citations
articleOpen access
Since stress can be transmitted to congeners via social metabolites, it is paramount to understand how the social context of abiotic stress influences aquatic organisms' responses to global changes. Here, we integrated the transcriptomic and phenotypic responses of zebrafish embryos to a UV damage/repair assay following scenarios of heat stress, its social context and their combination. Heat stress preceding UV exposure had a hormetic effect through the cellular stress response and DNA repair, rescuing and/or protecting embryos from UV damage. However, experiencing heat stress within a social context negated this molecular hormetic effect and lowered larval fitness. We discuss the molecular basis of interindividual chemical transmission within animal groups as another layer of complexity to organisms' responses to environmental stressors.
Publisher OA PDF DOI
"I'm not as afraid as a woman might be about sharing my exact location:" On the Intersection of Identity and Privacy Concerns in Fitness Tracking
2025-04-25 · 1 citations
articleOpen accessSenior author
Users' perceptions of fitness tracking privacy is a subject of active study, but how do various aspects of social identity inform these perceptions?We conducted an online survey (N=322) that explores the influence of identity on fitness tracking privacy perceptions and practices, considering participants' gender, race, age, and whether or not they identify as LGTBQ*.Participants reported how comfortable they felt sharing fitness data, commented on whether they believed their identity impacted this comfort, and brainstormed several data sharing risks and a possible mitigation for each risk.For each surveyed dimension of social identity, we find one or more reliable effects on participants' level of comfort sharing fitness data, specifically when considering institutional groups like employers, insurers, and advertisers.Further, 64% of participants indicate at least one of their identity characteristics informs their comfort.We also find evidence that the perceived risks of sharing fitness data vary by identity, but do not find evidence of difference in the strategies used to manage these risks.This work highlights a path towards reasoning about the privacy challenges of fitness tracking with respect for the lived experiences of all users. CCS Concepts Security and privacy Social aspects of security and privacy; Human and societal aspects of security and privacy
Publisher OA PDF DOI
Carbon Filter: Scalable, Efficient, and Secure Alert Triage for Endpoint Detection & Response
2025-10-19
articleSenior author
Endpoint Detection & Response (EDR) products detect threats by pattern matching endpoint telemetry against behavioral rules that describe potentially malicious behavior. However, EDR can suffer from high false positives that distract from actual attacks, leading to an “alert fatigue” problem. While provenance-based alert triage techniques have shown promise, historical provenance analysis is prohibitively slow when applied to the stream-based event processing pipelines that dominate industry today; provenance-based systems may take over a minute to inspect a single alert, while individual EDR customers can face tens of millions of alerts per day. At present, these approaches cannot scale to production environments. We present Carbon Filter, an automated alert triage mechanism that reduces false alerts by upwards of $82 \%$ and is already in use by thousands of Carbon Black EDR customers today. Our key insight is that the vast majority false alerts are triggered by programs that share a common initiation context, and thus the specific false alerts associated with an initiation context can be identified. However, rather than turning to costly provenance analysis, we hypothesize that it is sufficient to use the command line arguments of alert-triggering processes as the initiation context. Through prioritizing speed for similaritypreserving hashing, clustering, and search, we demonstrate that our approach scales to millions of alerts per hour ($\gt5 \mathrm{~K} / \mathrm{sec}$). In evaluations customer alert data, we demonstrate that Carbon Filter can identify $\mathbf{8 2} \boldsymbol{\%}$ of false alerts nearly a $\mathbf{6}$-fold improvement in signal-to-noise ratio. Further, when comparing to provenancebased approaches, we show that Carbon Filter (AUC $=0.94$) actually outperforms NoDoze ($\mathbf{A U C}=\mathbf{0. 6 0}$) and RapSheet ($\mathbf{A U C}=\mathbf{0. 9 0}$) while reducing analysis time by $5,064 \mathrm{x}$ and $26,723 \mathrm{x}$, respectively.
Publisher DOI
R-CAID: Embedding Root Cause Analysis within Provenance-based Intrusion Detection
2024-05-19 · 22 citations
articleSenior author
In modern enterprise security, endpoint detection products fire an alert when process activity matches known attack behavior patterns. Human analysts then perform Root Cause Analysis (RCA) over event logs to determine if the alert is indicative of an actual attack. Data Provenance can help to automate RCA by representing event logs as a causal dependency graphs; in fact, researchers are now examining whether provenance-based anomaly detection should replace pattern-based detection altogether. Unfortunately, we observe that current approaches leverage off-the-shelf graph embedding techniques that are unable to associate events with their root causes. This shortcoming not only fails to capitalize on the RCA capabilities of provenance, but also leaves provenance-based IDS vulnerable to mimicry and evasion attacks.This work presents the design and implementation of R-CAID, a novel approach to incorporate RCA into provenance-based IDS. R-CAID precomputes each node’s root causes during graph construction, then directly links those nodes to their root causes during embedding. Further, R-CAID’s classification model is node/process-level, rather than graph/system-level, bringing it more in line with the precision of commercial systems. Under a passive adversary model, we find that R-CAID consistently outperforms baseline graph neural networks, sequence-based log IDS, and even a commercial endpoint detection system. Under a white-box active adversary model, R-CAID maintains a high level of performance (e.g., for DARPA Theia, 0.94 AUC adversarial down from 0.99 passive). R-CAID achieves this by associating each system entity with its immutable and unforgeable root causes, preventing adversaries from being able to masquerade as legitimate processes. This work is thus the first to demonstrate the promise of provenance-based IDS in a manner that avoids the pitfalls of mimicry and evasion.
Publisher DOI
DrSec: Flexible Distributed Representations for Efficient Endpoint Security
2024-05-19 · 5 citations
article
The increasing complexity of attacks has given rise to varied security applications tackling profound tasks, ranging from alert triage to attack reconstruction. Yet, security products, such as Endpoint Detection and Response, bring together applications that are developed in isolation, trigger many false positives, miss actual attacks, and produce limited labels useful in supervised learning schemes. To address these challenges, we propose DrSec—a system employing self-supervised learning to pre-train foundation language models (LMs) that ingest event-sequence data and emit distributed representations for processes. Once pre-trained, the LMs can be adapted to solve different downstream tasks with limited to no supervision, helping unify the currently fractured application ecosystem. We trained DrSec with two LM types on a real-world dataset containing ∼91M processes and ∼2.55B events, and tested it in three application domains. We found that DrSec enables accurate, unsupervised process identification; outperforms leading methods on alert triage to reduce alert fatigue (e.g., 75.11% vs. ≤64.31% precision-recall area under curve); and accurately learns expert-developed rules, allowing tuning incident detectors to control false positives and negatives.
Publisher DOI
ORCHID: Streaming Threat Detection over Versioned Provenance Graphs
arXiv (Cornell University) · 2024-08-23
preprintOpen access
While Endpoint Detection and Response (EDR) are able to efficiently monitor threats by comparing static rules to the event stream, their inability to incorporate past system context leads to high rates of false alarms. Recent work has demonstrated Provenance-based Intrusion Detection Systems (Prov-IDS) that can examine the causal relationships between abnormal behaviors to improve threat classification. However, employing these Prov-IDS in practical settings remains difficult -- state-of-the-art neural network based systems are only fast in a fully offline deployment model that increases attacker dwell time, while simultaneously using simplified and less accurate provenance graphs to reduce memory consumption. Thus, today's Prov-IDS cannot operate effectively in the real-time streaming setting required for commercial EDR viability. This work presents the design and implementation of ORCHID, a novel Prov-IDS that performs fine-grained detection of process-level threats over a real time event stream. ORCHID takes advantage of the unique immutable properties of a versioned provenance graphs to iteratively embed the entire graph in a sequential RNN model while only consuming a fraction of the computation and memory costs. We evaluate ORCHID on four public datasets, including DARPA TC, to show that ORCHID can provide competitive classification performance while eliminating detection lag and reducing memory consumption by two orders of magnitude.
Publisher OA PDF DOI
Identifying Privacy Concerns in Smarthome Environments through Behavior Monitoring
2024-01-01
articleOpen accessSenior author
Publisher OA PDF DOI
Carbon Filter: Real-time Alert Triage Using Large Scale Clustering and Fast Search
arXiv (Cornell University) · 2024-05-07
preprintOpen access
"Alert fatigue" is one of the biggest challenges faced by the Security Operations Center (SOC) today, with analysts spending more than half of their time reviewing false alerts. Endpoint detection products raise alerts by pattern matching on event telemetry against behavioral rules that describe potentially malicious behavior, but can suffer from high false positives that distract from actual attacks. While alert triage techniques based on data provenance may show promise, these techniques can take over a minute to inspect a single alert, while EDR customers may face tens of millions of alerts per day; the current reality is that these approaches aren't nearly scalable enough for production environments. We present Carbon Filter, a statistical learning based system that dramatically reduces the number of alerts analysts need to manually review. Our approach is based on the observation that false alert triggers can be efficiently identified and separated from suspicious behaviors by examining the process initiation context (e.g., the command line) that launched the responsible process. Through the use of fast-search algorithms for training and inference, our approach scales to millions of alerts per day. Through batching queries to the model, we observe a theoretical maximum throughput of 20 million alerts per hour. Based on the analysis of tens of million alerts from customer deployments, our solution resulted in a 6-fold improvement in the Signal-to-Noise ratio without compromising on alert triage performance.
Publisher OA PDF DOI
GRASP: Hardening Serverless Applications through Graph Reachability Analysis of Security Policies
2024-05-08 · 11 citations
articleOpen access
Serverless computing is supplanting past versions of cloud computing as the easiest way to rapidly prototype and deploy applications. However, the reentrant and ephemeral nature of serverless functions only exacerbates the challenge of correctly specifying security policies. Unfortunately, with role-based access control solutions like Amazon Identity and Access Management (IAM) already suffering from pervasive misconfiguration problems, the likelihood of policy failures in serverless applications is high.
Publisher DOI

Recent grants

CAREER: Scalable Information Flow Monitoring and Enforcement through Data Provenance Unification
NSF · $528k · 2018–2024
SaTC: CORE: Medium: Principled Foundations for the Design and Evaluation of Graph-Based Host Intrusion Detection Systems
NSF · $1.2M · 2021–2026
CRII: SaTC: Transparent Capture and Aggregation of Secure Data Provenance for Smart Devices
NSF · $175k · 2017–2020

Frequent coauthors

Wajih Ul Hassan
University of Virginia
27 shared
Kevin Butler
22 shared
Patrick Traynor
University of Florida
15 shared
Pedro Beltrán-Álvarez
Hull York Medical School
14 shared
Riccardo Paccagnella
Carnegie Mellon University
13 shared
Nolen Scaife
University of Colorado Boulder
13 shared
Michael Bailey
13 shared
Kathleen Bulmer
Hull York Medical School
12 shared

Labs

STS LabPI
The STS Lab Team

Awards & honors

28th International Symposium on Research in Attacks, Intrusi…
46th IEEE Symposium on Security and Privacy (S&P'25) (2025)
ACM CHI Conference on Human Factors in Computing Systems (20…
33rd USENIX Security Symposium (Security'24) (2024)
45th IEEE Symposium on Security and Privacy (Oakland'24) (20…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Adam Bates

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you