
Andrea Arpaci-Dusseau
· Catherine A. Erickson Professor; Susan B. Horwitz ProfessorVerifiedUniversity of Wisconsin-Madison · Computer Sciences
Active 1997–2025
About
Andrea Arpaci-Dusseau is a prominent researcher in the field of storage and computer systems, recognized for outstanding leadership, innovation, and impact. Alongside Remzi Arpaci-Dusseau, Andrea is tied for 4th place in the SOSP/OSDI Hall of Fame for the most papers published in these premier systems conferences, with 17 papers including 6 since 2014. Their research has significantly advanced understanding and development in storage systems, file systems, and distributed storage. Andrea's work has led to numerous real-world impacts, including patented techniques utilized by major companies such as Hitachi, IBM, Intel, and others. Intel currently licenses one of these patents. The research has influenced commercial and open-source systems, with contributions such as the transactional checksum now part of the Linux ext4 file system, improving performance and reliability on millions of machines worldwide. Other innovations include approaches adopted in systems like ZFS and Alluxio, and improvements to Linux file systems such as ext4 and XFS. Andrea's research group has also contributed to performance enhancements and bug fixes in commercial products like the EMC Centera and influenced fault-handling mechanisms in storage vendors including NetApp. Their patented recovery techniques are licensed by Intel, and their work has been incorporated into products by companies like FusionIO and Apple. Additionally, Andrea's research has fostered the adoption of open-source tools and concepts in the broader systems community, impacting databases, key-value stores, and distributed storage systems. Overall, Andrea Arpaci-Dusseau's research has driven forward both theoretical understanding and practical advancements in storage and file system technologies, with broad influence across academia and industry.
Research topics
- Computer Science
- Artificial Intelligence
- Parallel computing
- Operating system
- Programming language
- Algorithm
- Mathematics
- Computer network
- Database
- Embedded system
- Real-time computing
- Distributed computing
- Theoretical computer science
Selected publications
LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics
Proceedings of the VLDB Endowment · 2025-09-01
articleWe present LiquidCache, a novel pushdown-based disaggregated caching system that evaluates filters on cache servers before transmitting data to compute nodes. Our key observation is that data decoding, not filter evaluation, is the primary bottleneck in existing systems. To address this challenge, we transcode Parquet data into a lightweight "Liquid" format and cache it for efficient filter evaluation. The Liquid format resides solely in the cache layer, requiring no changes to existing deployments and enabling easy adoption of new encodings without breaking compatibility. Through integration with Apache DataFusion and evaluation with ClickBench and TPC-H, we demonstrate that LiquidCache reduces cache CPU time by up to 10× without increasing memory footprint, and reduces network traffic by two orders of magnitudes compared to non-pushdown systems.
PANGOLIN: a Comprehensive Testing Framework for Configuration-Rich Key-Value Stores
2025-08-28
articleIn this paper, we present Pangolin, a comprehensive testing framework for configuration-rich key-value stores. To better understand bugs in modern key-value stores and explore domain knowledge for efficiently identifying new ones, we first comprehensively study historical bugs in five mature key-value stores during the last eight years. Then, we design and implement Pangolin, which is motivated by insights from our bug study, which indicated most bugs could be identified by systematically testing a small sequence of operations and configurations. Specifically, Pangolin practices these insights by introducing a bounded testing strategy into a spectrum of black-box and fuzzing test procedures. Finally, we utilize Pangolin to find 20 bugs and reproduce 443 historical bugs in five mature key-value stores (RocksDB, LevelDB, HyperlevelDB, BadgerDB, and Redis), making it an attractive supplement to handwritten test suites.
Crossword: Adaptive Consensus for Dynamic Data-Heavy Workloads
ArXiv.org · 2025-09-08
preprintOpen accessWe present Crossword, a flexible consensus protocol for dynamic data-heavy workloads, a rising challenge in the cloud where replication payload sizes span a wide spectrum and introduce sporadic bandwidth stress. Crossword applies per-instance erasure coding and distributes coded shards intelligently to reduce critical-path data transfer significantly when desirable. Unlike previous approaches that statically assign shards to servers, Crossword enables an adaptive tradeoff between the assignment of shards and quorum size in reaction to dynamic workloads and network conditions, while always retaining the availability guarantee of classic protocols. Crossword handles leader failover gracefully by employing a lazy follower gossiping mechanism that incurs minimal impact on critical-path performance. We implement Crossword (along with relevant protocols) in Gazette, a distributed, replicated, and protocol-generic key-value store written in async Rust. We evaluate Crossword comprehensively to show that it matches the best performance among previous protocols (MultiPaxos, Raft, RSPaxos, and CRaft) in static scenarios, and outperforms them by up to 2.3x under dynamic workloads and network conditions. Our integration of Crossword with CockroachDB brings 1.32x higher aggregate throughput to TPC-C under 5-way replication. We will open-source Gazette upon publication.
Bodega: Serving Linearizable Reads Locally from Anywhere at Anytime via Roster Leases
ArXiv.org · 2025-09-08
preprintOpen accessWe present Bodega, the first consensus protocol that serves linearizable reads locally from any desired node, regardless of interfering writes. Bodega achieves this via a novel roster leases algorithm that safeguards the roster, a new notion of cluster metadata. The roster is a generalization of leadership; it tracks arbitrary subsets of replicas as responder nodes for local reads. A consistent agreement on the roster is established through roster leases, an all-to-all leasing mechanism that generalizes existing all-to-one leasing approaches (Leader Leases, Quorum Leases), unlocking a new point in the protocol design space. Bodega further employs optimistic holding and early accept notifications to minimize interruption from interfering writes, and incorporates smart roster coverage and lightweight heartbeats to maximize practicality. Bodega is a non-intrusive extension to classic consensus; it imposes no special requirements on writes other than a responder-covering quorum. We implement Bodega and related works in Vineyard, a protocol-generic replicated key-value store written in async Rust. We compare it to previous protocols (Leader Leases, EPaxos, PQR, and Quorum Leases) and two production coordination services (etcd and ZooKeeper). Bodega speeds up average client read requests by 5.6x-13.1x on real WAN clusters versus previous approaches under moderate write interference, delivers comparable write performance, supports fast proactive roster changes as well as fault tolerance via leases, and closely matches the performance of sequentially-consistent etcd and ZooKeeper deployments across all YCSB workloads. We will open-source Vineyard upon publication.
Data-Centric Serverless Computing with LAMBDASTORE
Preprints.org · 2025-12-11
preprintOpen accessLAMBDASTORE is a new serverless platform with an integrated storage engine tailored for stateful serverless workloads. Its compute-storage co-design colocates serverless functions with their associated data, yielding significant performance gains. It also leverages the transaction interface of its storage engine to provide serializable workflows and exactly-once semantics. This paper presents the design of LAMBDASTORE and introduces three key contributions. First, it adopts an object-oriented model in which functions are bundled with their associated data, enabling function execution to be scheduled directly at the data’s location. Second, the storage layer provides efficient transaction processing by dynamically adjusting lock granularity and employing a customized optimistic concurrency control protocol. Third, to enable colocation without sacrificing elasticity, the system supports data migration and lightweight replication at the granularity of individual objects. Experiments show that LAMBDASTOREoutperformsconventional serverless platforms, especially in read-heavy workloads. In such settings, LAMBDASTORE achieves throughput orders of magnitude higher than existing systems, while maintaining average end-to-end latencies below 20 ms.
Getting the MOST out of your Storage Hierarchy with Mirror-Optimized Storage Tiering
ArXiv.org · 2025-12-02
preprintOpen accessWe present Mirror-Optimized Storage Tiering (MOST), a novel tiering-based approach optimized for modern storage hierarchies. The key idea of MOST is to combine the load balancing advantages of mirroring with the space-efficiency advantages of tiering. Specifically, MOST dynamically mirrors a small amount of hot data across storage tiers to efficiently balance load, avoiding costly migrations. As a result, MOST is as space-efficient as classic tiering while achieving better bandwidth utilization under I/O-intensive workloads. We implement MOST in Cerberus, a user-level storage management layer based on CacheLib. We show the efficacy of Cerberus through a comprehensive empirical study: across a range of static and dynamic workloads, Cerberus achieves better throughput than competing approaches on modern storage hierarchies especially under I/O-intensive and dynamic workloads.
2025-10-01
articleOpen accessMemory tiering in datacenters does not achieve its full potential due to hotness fragmentation---the intermingling of hot and cold objects within memory pages. This fragmentation prevents page-basedreclamation systems from distinguishing truly hot pages frompages containing mostly cold objects, fundamentally limiting memory efficiency despite highly skewed accesses. We introduce address-space engineering: dynamically reorganizing application virtual address spaces to create uniformly hot and cold regions that any page-level tiering backend can manage effectively. HADES demonstrates this frontend/backend approach through a compiler-runtime system that tracks and migrates objects based on access patterns, requiring minimal developer intervention. Evaluations across ten data structures achieve up to 70% memory reduction with 3% performance overhead, showing that address space engineering enables existing reclamation systems to reclaim memory aggressively without performance degradation.
Revealing the Unstable Foundations of eBPF-Based Kernel Extensions
2025-03-26 · 3 citations
articleOpen accesseBPF programs significantly enhance kernel capabilities, but encounter substantial compatibility challenges due to their deep integration with unstable kernel internals. We introduce DepSurf, a tool that identifies dependency mismatches between eBPF programs and kernel images. Our analysis of 25 kernel images spanning 8 years reveals that dependency mismatches are pervasive, stemming from kernel source code evolution, diverse configuration options, and intricate compilation processes. We apply DepSurf to 53 real-world eBPF programs, and find that 83% are impacted by dependency mismatches, underscoring the urgent need for systematic dependency analysis. By identifying these mismatches, DepSurf enables a more robust development and maintenance process for eBPF programs, enhancing their reliability across a wide range of kernels.
arXiv (Cornell University) · 2024-09-03
preprintOpen accessWe present a practical model of non-transactional consistency levels in the context of distributed data replication. Unlike prior work, our simple Shared Object Pool (SOP) model defines common consistency levels in a unified framework centered around the single concept of ordering. This naturally reflects modern cloud object storage services and is thus easy to understand. We show that a consistency level can be intuitively defined by specifying two types of constraints on the validity of orderings allowed by the level: convergence, which bounds the lineage shape of the ordering, and relationship, which bounds the relative positions between operations. We give examples of representative protocols and systems, and discuss their availability upper bound. To further demonstrate the expressiveness and practical relevance of our model, we use it to implement a Jepsen-integrated consistency checker for the four most common levels (linearizable, sequential, causal+, and eventual); the checker analyzes consistency conformity for small-scale histories of real system runs (etcd, ZooKeeper, and RabbitMQ).
Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution
2024-06-27
articleWe present Robust Alternative Execution (RAE), an approach to transparently mask runtime errors in performance-oriented filesystems via temporarily executing an alternative shadow filesystem. A shadow filesystem has the primary goal of robustness, achieved through a simple implementation without performance optimizations and concurrency while adhering to the same API and on-disk formats as the base filesystem it enhances. While the base performance-oriented filesystem may contain bugs, the shadow implementation is formally verified, leveraging advancements in the verification of low-level systems code. In the common case, the base filesystem executes and delivers high performance to applications; however, when a bug is triggered, the slow-but-correct shadow takes over, updates state correctly, and then resumes the base, thus providing high availability.
Recent grants
DC: Small: Collaborative Research: DARE: Declarative and Scalable Recovery
NSF · $190k · 2010–2013
CAREER:Exploiting Gray-Box Techniques in Systems
NSF · $350k · 2002–2008
Frequent coauthors
- 210 shared
Remzi H. Arpaci-Dusseau
- 26 shared
Ramnatthan Alagappan
University of Illinois Urbana-Champaign
- 19 shared
Haryadi S. Gunawi
University of Chicago
- 17 shared
Muthian Sivathanu
- 17 shared
Aishwarya Ganesan
University of Illinois Urbana-Champaign
- 17 shared
Lakshmi N. Bairavasundaram
NetApp (United States)
- 16 shared
John Bent
Los Alamos National Laboratory
- 15 shared
Vijay Chidambaram
Labs
The ADvanced Systems Laboratory (ADSL)PI
The ADvanced Systems Laboratory (ADSL) conducts research in the field of computer systems, focusing on storage and file systems.
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Andrea Arpaci-Dusseau
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup