Remzi Arpaci-Dusseau

· Grace Wahba Professor; Vilas Distinguished Achievement ProfessorVerified

University of Wisconsin-Madison · Computer Sciences

Active 1997–2025

h-index48

Citations7.7k

Papers22826 last 5y

Funding$7.7M

Faculty page Lab page

See your match with Remzi Arpaci-Dusseau — sign in to PhdFit.Sign in

Research topics

Computer Science
Artificial Intelligence
Parallel computing
Operating system
Mathematics
Embedded system
Computer network
Algorithm
Database
Programming language
Distributed computing
Theoretical computer science
Real-time computing

Selected publications

Data-Centric Serverless Computing with LAMBDASTORE
Preprints.org · 2025-12-11
preprintOpen accessSenior author
LAMBDASTORE is a new serverless platform with an integrated storage engine tailored for stateful serverless workloads. Its compute-storage co-design colocates serverless functions with their associated data, yielding significant performance gains. It also leverages the transaction interface of its storage engine to provide serializable workflows and exactly-once semantics. This paper presents the design of LAMBDASTORE and introduces three key contributions. First, it adopts an object-oriented model in which functions are bundled with their associated data, enabling function execution to be scheduled directly at the data’s location. Second, the storage layer provides efficient transaction processing by dynamically adjusting lock granularity and employing a customized optimistic concurrency control protocol. Third, to enable colocation without sacrificing elasticity, the system supports data migration and lightweight replication at the granularity of individual objects. Experiments show that LAMBDASTOREoutperformsconventional serverless platforms, especially in read-heavy workloads. In such settings, LAMBDASTORE achieves throughput orders of magnitude higher than existing systems, while maintaining average end-to-end latencies below 20 ms.
Publisher DOI
LiquidCache: Efficient Pushdown Caching for Cloud-Native Data Analytics
Proceedings of the VLDB Endowment · 2025-09-01
articleSenior author
We present LiquidCache, a novel pushdown-based disaggregated caching system that evaluates filters on cache servers before transmitting data to compute nodes. Our key observation is that data decoding, not filter evaluation, is the primary bottleneck in existing systems. To address this challenge, we transcode Parquet data into a lightweight "Liquid" format and cache it for efficient filter evaluation. The Liquid format resides solely in the cache layer, requiring no changes to existing deployments and enabling easy adoption of new encodings without breaking compatibility. Through integration with Apache DataFusion and evaluation with ClickBench and TPC-H, we demonstrate that LiquidCache reduces cache CPU time by up to 10× without increasing memory footprint, and reduces network traffic by two orders of magnitudes compared to non-pushdown systems.
Publisher DOI
Getting the MOST out of your Storage Hierarchy with Mirror-Optimized Storage Tiering
ArXiv.org · 2025-12-02
preprintOpen accessSenior author
We present Mirror-Optimized Storage Tiering (MOST), a novel tiering-based approach optimized for modern storage hierarchies. The key idea of MOST is to combine the load balancing advantages of mirroring with the space-efficiency advantages of tiering. Specifically, MOST dynamically mirrors a small amount of hot data across storage tiers to efficiently balance load, avoiding costly migrations. As a result, MOST is as space-efficient as classic tiering while achieving better bandwidth utilization under I/O-intensive workloads. We implement MOST in Cerberus, a user-level storage management layer based on CacheLib. We show the efficacy of Cerberus through a comprehensive empirical study: across a range of static and dynamic workloads, Cerberus achieves better throughput than competing approaches on modern storage hierarchies especially under I/O-intensive and dynamic workloads.
Publisher OA PDF DOI
PANGOLIN: a Comprehensive Testing Framework for Configuration-Rich Key-Value Stores
2025-08-28
articleSenior author
In this paper, we present Pangolin, a comprehensive testing framework for configuration-rich key-value stores. To better understand bugs in modern key-value stores and explore domain knowledge for efficiently identifying new ones, we first comprehensively study historical bugs in five mature key-value stores during the last eight years. Then, we design and implement Pangolin, which is motivated by insights from our bug study, which indicated most bugs could be identified by systematically testing a small sequence of operations and configurations. Specifically, Pangolin practices these insights by introducing a bounded testing strategy into a spectrum of black-box and fuzzing test procedures. Finally, we utilize Pangolin to find 20 bugs and reproduce 443 historical bugs in five mature key-value stores (RocksDB, LevelDB, HyperlevelDB, BadgerDB, and Redis), making it an attractive supplement to handwritten test suites.
Publisher DOI
Crossword: Adaptive Consensus for Dynamic Data-Heavy Workloads
ArXiv.org · 2025-09-08
preprintOpen accessSenior author
We present Crossword, a flexible consensus protocol for dynamic data-heavy workloads, a rising challenge in the cloud where replication payload sizes span a wide spectrum and introduce sporadic bandwidth stress. Crossword applies per-instance erasure coding and distributes coded shards intelligently to reduce critical-path data transfer significantly when desirable. Unlike previous approaches that statically assign shards to servers, Crossword enables an adaptive tradeoff between the assignment of shards and quorum size in reaction to dynamic workloads and network conditions, while always retaining the availability guarantee of classic protocols. Crossword handles leader failover gracefully by employing a lazy follower gossiping mechanism that incurs minimal impact on critical-path performance. We implement Crossword (along with relevant protocols) in Gazette, a distributed, replicated, and protocol-generic key-value store written in async Rust. We evaluate Crossword comprehensively to show that it matches the best performance among previous protocols (MultiPaxos, Raft, RSPaxos, and CRaft) in static scenarios, and outperforms them by up to 2.3x under dynamic workloads and network conditions. Our integration of Crossword with CockroachDB brings 1.32x higher aggregate throughput to TPC-C under 5-way replication. We will open-source Gazette upon publication.
Publisher OA PDF DOI
Bodega: Serving Linearizable Reads Locally from Anywhere at Anytime via Roster Leases
ArXiv.org · 2025-09-08
preprintOpen accessSenior author
We present Bodega, the first consensus protocol that serves linearizable reads locally from any desired node, regardless of interfering writes. Bodega achieves this via a novel roster leases algorithm that safeguards the roster, a new notion of cluster metadata. The roster is a generalization of leadership; it tracks arbitrary subsets of replicas as responder nodes for local reads. A consistent agreement on the roster is established through roster leases, an all-to-all leasing mechanism that generalizes existing all-to-one leasing approaches (Leader Leases, Quorum Leases), unlocking a new point in the protocol design space. Bodega further employs optimistic holding and early accept notifications to minimize interruption from interfering writes, and incorporates smart roster coverage and lightweight heartbeats to maximize practicality. Bodega is a non-intrusive extension to classic consensus; it imposes no special requirements on writes other than a responder-covering quorum. We implement Bodega and related works in Vineyard, a protocol-generic replicated key-value store written in async Rust. We compare it to previous protocols (Leader Leases, EPaxos, PQR, and Quorum Leases) and two production coordination services (etcd and ZooKeeper). Bodega speeds up average client read requests by 5.6x-13.1x on real WAN clusters versus previous approaches under moderate write interference, delivers comparable write performance, supports fast proactive roster changes as well as fault tolerance via leases, and closely matches the performance of sequentially-consistent etcd and ZooKeeper deployments across all YCSB workloads. We will open-source Vineyard upon publication.
Publisher OA PDF DOI
Zerrow: True Zero-Copy Arrow Pipelines in Bauplan
ArXiv.org · 2025-04-08
preprintOpen access
Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. While Arrow is known as the "zero copy format", in practice, limited Linux kernel support for shared memory makes it difficult to avoid copying entirely. In this work, we introduce several new techniques to eliminate nearly all copying from pipelines: in particular, we implement a new kernel module that performs de-anonymization, thus eliminating a copy to intermediate data. We conclude by sharing our preliminary evaluation on different workloads types, as well as discussing our plan for future improvements.
Publisher OA PDF DOI
Tidying Up the Address Space
2025-10-01
articleOpen access
Memory tiering in datacenters does not achieve its full potential due to hotness fragmentation---the intermingling of hot and cold objects within memory pages. This fragmentation prevents page-basedreclamation systems from distinguishing truly hot pages frompages containing mostly cold objects, fundamentally limiting memory efficiency despite highly skewed accesses. We introduce address-space engineering: dynamically reorganizing application virtual address spaces to create uniformly hot and cold regions that any page-level tiering backend can manage effectively. HADES demonstrates this frontend/backend approach through a compiler-runtime system that tracks and migrates objects based on access patterns, requiring minimal developer intervention. Evaluations across ten data structures achieve up to 70% memory reduction with 3% performance overhead, showing that address space engineering enables existing reclamation systems to reclaim memory aggressively without performance degradation.
Publisher DOI
Revealing the Unstable Foundations of eBPF-Based Kernel Extensions
2025-03-26 · 3 citations
articleOpen accessSenior author
eBPF programs significantly enhance kernel capabilities, but encounter substantial compatibility challenges due to their deep integration with unstable kernel internals. We introduce DepSurf, a tool that identifies dependency mismatches between eBPF programs and kernel images. Our analysis of 25 kernel images spanning 8 years reveals that dependency mismatches are pervasive, stemming from kernel source code evolution, diverse configuration options, and intricate compilation processes. We apply DepSurf to 53 real-world eBPF programs, and find that 83% are impacted by dependency mismatches, underscoring the urgent need for systematic dependency analysis. By identifying these mismatches, DepSurf enables a more robust development and maintenance process for eBPF programs, enhancing their reliability across a wide range of kernels.
Publisher DOI
Shadow Filesystems: Recovering from Filesystem Runtime Errors via Robust Alternative Execution
2024-06-27
article
We present Robust Alternative Execution (RAE), an approach to transparently mask runtime errors in performance-oriented filesystems via temporarily executing an alternative shadow filesystem. A shadow filesystem has the primary goal of robustness, achieved through a simple implementation without performance optimizations and concurrency while adhering to the same API and on-disk formats as the base filesystem it enhances. While the base performance-oriented filesystem may contain bugs, the shadow implementation is formally verified, leveraging advancements in the verification of low-level systems code. In the common case, the base filesystem executes and delivers high performance to applications; however, when a bug is triggered, the slow-but-correct shadow takes over, updates state correctly, and then resumes the base, thus providing high availability.
Publisher DOI