Antonia Zhai
VerifiedUniversity of Minnesota · Computer Science and Engineering
Active 2000–2026
About
Antonia Zhai is an Associate Professor in the Department of Computer Science & Engineering at the University of Minnesota Twin Cities. She joined the department in 2005 as an assistant professor and was promoted to associate professor in 2011. Her educational background includes a Ph.D. in Computer Science from Carnegie Mellon University, earned in 2005, a Master of Applied Science in Computer Engineering from the University of Toronto in 1998, and a Bachelor of Applied Science in Computer Engineering from the same university in 1996. Prior to her current position, she served as a graduate research assistant at Carnegie Mellon University and the University of Toronto, and she also consulted for Intel Corporation from 2012 to 2013. Her research interests focus on developing novel compiler optimizations and architecture features aimed at improving processor performance as well as enhancing non-performance features such as programmability, security, testability, and reliability. She has contributed to the field through her work on computing systems, with recent research projects supported by grants from the National Science Foundation. Zhai has received recognition for her work, including the IBM Faculty Award in 2007 and the ICCD best paper award in 2008. She teaches courses related to computer architecture, compilers, and quantum computing, and her professional contributions include numerous publications and awards in her field.
Research topics
- Computer Science
- Operating system
- Embedded system
- Parallel computing
- Computer network
- Computer architecture
- Distributed computing
Selected publications
Supporting Secured Integration of Microarchitectural Defenses
ArXiv.org · 2026-01-08
articleOpen accessThere has been a plethora of microarchitectural-level attacks leading to many proposed countermeasures. This has created an unexpected and unaddressed security issue where naive integration of those defenses can potentially lead to security vulnerabilities. This occurs when one defense changes an aspect of a microarchitecture that is crucial for the security of another defense. We refer to this problem as a microarchitectural defense assumption violation} (MDAV). We propose a two-step methodology to screen for potential MDAVs in the early-stage of integration. The first step is to design and integrate a composed model, guided by bounded model checking of security properties. The second step is to implement the model concretely on a simulator and to evaluate with simulated attacks. As a contribution supporting the first step, we propose an event-based modeling framework, called Maestro, for testing and evaluating microarchitectural models with integrated defenses. In our evaluation, Maestro reveals MDAVs (8), supports compact expression (~15x Alloy LoC ratio), enables semantic composability and eliminates performance degradations (>100x). As a contribution supporting the second step, we use an event-based simulator (GEM5) for investigating integrated microarchitectural defenses. We show that a covert channel attack is possible on a naively integrated implementation of some state-of-the-art defenses, and a repaired implementation using our integration methodology is resilient to the attack.
Supporting Secured Integration of Microarchitectural Defenses
arXiv (Cornell University) · 2026-01-08
preprintOpen accessThere has been a plethora of microarchitectural-level attacks leading to many proposed countermeasures. This has created an unexpected and unaddressed security issue where naive integration of those defenses can potentially lead to security vulnerabilities. This occurs when one defense changes an aspect of a microarchitecture that is crucial for the security of another defense. We refer to this problem as a microarchitectural defense assumption violation} (MDAV). We propose a two-step methodology to screen for potential MDAVs in the early-stage of integration. The first step is to design and integrate a composed model, guided by bounded model checking of security properties. The second step is to implement the model concretely on a simulator and to evaluate with simulated attacks. As a contribution supporting the first step, we propose an event-based modeling framework, called Maestro, for testing and evaluating microarchitectural models with integrated defenses. In our evaluation, Maestro reveals MDAVs (8), supports compact expression (~15x Alloy LoC ratio), enables semantic composability and eliminates performance degradations (>100x). As a contribution supporting the second step, we use an event-based simulator (GEM5) for investigating integrated microarchitectural defenses. We show that a covert channel attack is possible on a naively integrated implementation of some state-of-the-art defenses, and a repaired implementation using our integration methodology is resilient to the attack.
Shield Bash: Abusing Defensive Coherence State Retrieval to Break Timing Obfuscation
ArXiv.org · 2025-04-14
preprintOpen accessMicroarchitectural attacks are a significant concern, leading to many hardware-based defense proposals. However, different defenses target different classes of attacks, and their impact on each other has not been fully considered. To raise awareness of this problem, we study an interaction between two state-of-the art defenses in this paper, timing obfuscations of remote cache lines (TORC) and delaying speculative changes to remote cache lines (DSRC). TORC mitigates cache-hit based attacks and DSRC mitigates speculative coherence state change attacks. We observe that DSRC enables coherence information to be retrieved into the processor core, where it is out of the reach of timing obfuscations to protect. This creates an unforeseen consequence that redo operations can be triggered within the core to detect the presence or absence of remote cache lines, which constitutes a security vulnerability. We demonstrate that a new covert channel attack is possible using this vulnerability. We propose two ways to mitigate the attack, whose performance varies depending on an application's cache usage. One way is to never send remote exclusive coherence state (E) information to the core even if it is created. The other way is to never create a remote E state, which is responsible for triggering redos. We demonstrate the timing difference caused by this microarchitectural defense assumption violation using GEM5 simulations. Performance evaluation on SPECrate 2017 and PARSEC benchmarks of the two fixes show less than 32\% average overhead across both sets of benchmarks. The repair which prevented the creation of remote E state had less than 2.8% average overhead.
DeCOS: Data-Efficient Reinforcement Learning for Compiler Optimization Selection Ignited by LLM
2025-06-08
articleSenior authorNon-Fusion Based Coherent Cache Randomization Using Cross-Domain Accesses
2024-06-28
articleRandomization has proven to be a effective defense against conflict-based side-channel attacks in a shared cache. It improves security by assigning a unique randomization scheme to each security domain, e.g., though a different hashing function. However, if two domains have shared data, the domains must be fused in order to guarantee correctness (i.e., data coherence). Such domain fusion significantly reduces the effectiveness of randomization and weakens its security protection.
Interleaved Function Stream Execution Model for Cache-Aware High-Speed Stateful Packet Processing
2024-07-23
articleThe evolving network infrastructure, particularly the 5G core network, is increasingly adopting cloud technologies. This shift brings to the forefront the challenge of meeting the demanding per-packet processing requirements posed by multi-hundred Gbps Ethernet NICs (network interface cards). While traditional NFV (network function virtualization) platforms are effective on older hardware, the per-packet run-to-completion (RTC) execution model for per-packet processing suffers from stalling on state access due to L1/L2 cache misses. Although previous work applying software prefetching can mitigate the issues, their applications are fundamentally limited by the nature of a single execution stream, hence limiting them to batch lookups, suffering from control-flow divergence, and requiring manual tuning. To address the limitations, we introduce a novel interleaved function stream execution model that exploits the function-level parallelism through memory-level parallelism, targeting feature-rich network functions such as 5G Core. To provide the visibility into network functions, we introduce a novel programming model based on the principle of Granular Decomposition, which provides deep visibility into the state access by decoupling the state in a more fine-grained manner compared to traditional modular approaches. We integrate these two innovative designs into a new open-source NF platform, which we refer to as GuNFu. We have tested GuNFu on widely deployed network functions such as 5G UPF (User Plane Function), 5G AMF (Access Management Function), NAT (Network Address Translator) and others. Extensive evaluations reveal that GuNFu can achieve throughput ranging from 1.5 to 6 times over the traditional modular approach.
2023-10-01
articleOpen access1st authorCorrespondingPREDATOR: A Cache Side-Channel Attack Detector Based on Precise Event Monitoring
2022-09-01 · 5 citations
articleSenior authorRecent work has demonstrated the security risk associated with micro-architecture side-channels. The cache timing side-channel is a particularly popular target due to its availability and high leakage bandwidth. Existing proposals for defending cache side-channel attacks either degrade cache performance and/or limit cache sharing, hence, should only be invoked when the system is under attack. A lightweight monitoring mechanism that detects malicious micro-architecture manipulation in realistic environments is essential for the judicious deployment of these defense mechanisms.In this paper, we propose PREDATOR, a cache side-channel attack detector that identifies cache events caused by an attacker. To detect side-channel attacks in noisy environments, we take advantage of the observation that, unlike non-specific noises, an active attacker alters victim’s micro-architectural states on security critical accesses and thus causes the victim extra cache events on those accesses. PREDATOR uses precise performance counters to collect detailed victim’s access information and analyzes location-based deviations. PREDATOR is capable of detecting five different attacks with high accuracy and limited performance overhead in complex noisy execution environments. PREDATOR remains effective even when the attacker slows the attack rate by 256 times. Furthermore, PREDATOR is able to accurately report details about the attack such as the instruction that accesses the attacked data. In the case of GnuPG RSA [20], PREDATOR can pinpoint the square/multiply operations in the Modulo-Reduce algorithm; and in the case of OpenSSL AES [45], it can identify the accesses to the T <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e</inf> -Table.
2022-09-01 · 1 citations
articleOpen accessACM SIGMETRICS Performance Evaluation Review · 2022 · 3 citations
- Computer Science
- Computer Science
- Distributed computing
In this paper, we consider the challenges that arise from the need to scale virtualized network functions (VNFs) at 100 Gbps line speed and beyond. Traditional VNF designs are monolithic in state management and scheduling: internally maintaining all states and operations associated with them. Without proper design considerations, it suffers from limitations when scaling at 100 Gbps link speed and beyond: the inability of efficient utilization of the cache because of the contention due to the frequent control plane activities, computational/memory-intensive tasks taking up CPU times, shares states causing the synchronization among the cores. We address these limitations by arguing for the need to granularly decompose a VNF into data/control components that are co-located within a server but can be independently scaled among the cores. To realize the approach, we design a "serverless" programming framework with novel abstraction to optimize the data components that must process packets at the line speed, reduce the contention of the data states and enable run-time scheduling of different components for improved resource utilization. The abstractions, combined with the runtime system that we design, help NFV developers focus on the logic and correctness of VNF programming without worrying about how VNFs may be scaled in or out. We evaluate our platform by comparing it with monolithic approaches using different workloads and by analyzing its advantages of separation on scalability, performance determinism, and feature velocity.
Recent grants
Frequent coauthors
- 43 shared
Pen-Chung Yew
- 30 shared
Stephen McCamant
University of Minnesota
- 18 shared
Wenwen Wang
- 17 shared
Wei‐Chung Hsu
- 16 shared
Todd C. Mowry
Carnegie Mellon University
- 15 shared
Christopher B. Colohan
Carnegie Mellon University
- 14 shared
Minjun Wu
Twin Cities Orthopedics
- 12 shared
Venkatesan Packirisamy
Nvidia (United States)
Labs
Antonia ZhaiPI
Education
PhD, Computer Science
Carnegie Mellon University
Awards & honors
- IBM Faculty Award (2007)
- ICCD Best Paper Award (2008)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Antonia Zhai
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup