Aurojit Panda
· Associate Professor in Computer ScienceVerifiedNew York University · Computer Science
Active 2008–2026
About
Aurojit Panda is an associate professor in Computer Science at New York University (NYU). He earned his PhD from the University of California, Berkeley, where he was advised by Scott Shenker and conducted research in the NetSys Lab. Prior to his doctoral studies, he received a Sc.B. with honors in Math and Computer Science from Brown University. Before joining NYU, Panda worked as a software developer at Nefeli Networks, a startup specializing in network function orchestration solutions. Additionally, he spent several years working on the Midori kernel at Microsoft between his time at Brown and Berkeley. Professor Panda's research focuses on systems and networking problems, with a particular interest in improving system reliability. His work aims to identify bugs before deployment and enhance fault tolerance in systems. He is engaged in a broad range of topics within this domain and is open to collaborating with NYU undergraduates, masters, and PhD students who have relevant coursework or experience in systems or networking. His teaching includes courses such as Distributed Systems and Undergraduate Operating Systems.
Research topics
- Computer Science
- Operating system
- Computer Security
- Distributed computing
- Software engineering
- Embedded system
- Programming language
- World Wide Web
Selected publications
Probabilistic Fair Ordering of Events
Open MIND · 2026-02-09
preprintA growing class of applications depends on fair ordering, where events that occur earlier should be processed before later ones. Providing such guarantees is difficult in practice because clock synchronization is inherently imperfect: events generated at different clients within a short time window may carry timestamps that cannot be reliably ordered. Rather than attempting to eliminate synchronization error, we embrace it and establish a probabilistically fair sequencing process. Tommy is a sequencer that uses a statistical model of per-clock synchronization error to compare noisy timestamps probabilistically. Although this enables ordering of two events, the probabilistic comparator is intransitive, making global ordering non-trivial. We address this challenge by mapping the sequencing problem to a classical ranking problem from social choice theory, which offers principled mechanisms for reasoning with intransitive comparisons. Using this formulation, Tommy produces a partial order of events, achieving significantly better fairness than a Spanner TrueTime-based baseline approach.
Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication
arXiv (Cornell University) · 2026-01-06
preprintOpen accessAs Byzantine Fault Tolerant (BFT) protocols begin to be used in permissioned blockchains for user-facing applications such as payments, it is crucial that they provide low latency. In pursuit of low latency, some recently proposed BFT consensus protocols employ a leaderless optimistic fast path, in which clients broadcast their requests directly to replicas without first serializing requests at a leader, resulting in an end-to-end commit latency of 2 message delays ($2Δ$) during fault-free, synchronous periods. However, such a fast path only works if there is no contention: concurrent contending requests can cause replicas to diverge if they receive conflicting requests in different orders, triggering costly recovery procedures. In this work, we present Aspen, a leaderless BFT protocol that achieves a near-optimal latency of $2Δ+ \varepsilon$, where $\varepsilon$ indicates a short waiting delay. Aspen removes the no-contention condition by utilizing a best-effort sequencing layer based on loosely synchronized clocks and network delay estimates. Aspen requires $n = 3f + 2p + 1$ replicas to cope with up to $f$ Byzantine nodes. The $2p$ extra nodes allow Aspen's fast path to proceed even if up to $p$ replicas diverge due to unpredictable network delays. When its optimistic conditions do not hold, Aspen falls back to PBFT-style protocol, guaranteeing safety and liveness under partial synchrony. In experiments with wide-area distributed replicas, Aspen commits requests in less than 75 ms, a 1.2 to 3.3$\times$ improvement compared to previous protocols, while supporting 19,000 requests per second.
Probabilistic Fair Ordering of Events
ArXiv.org · 2026-02-09
articleOpen accessA growing class of applications depends on fair ordering, where events that occur earlier should be processed before later ones. Providing such guarantees is difficult in practice because clock synchronization is inherently imperfect: events generated at different clients within a short time window may carry timestamps that cannot be reliably ordered. Rather than attempting to eliminate synchronization error, we embrace it and establish a probabilistically fair sequencing process. Tommy is a sequencer that uses a statistical model of per-clock synchronization error to compare noisy timestamps probabilistically. Although this enables ordering of two events, the probabilistic comparator is intransitive, making global ordering non-trivial. We address this challenge by mapping the sequencing problem to a classical ranking problem from social choice theory, which offers principled mechanisms for reasoning with intransitive comparisons. Using this formulation, Tommy produces a partial order of events, achieving significantly better fairness than a Spanner TrueTime-based baseline approach.
CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting
2026-03-10 · 1 citations
articleOpen accessSenior author3D Gaussian Splatting (3DGS) is an increasingly popular novel view synthesis approach due to its fast rendering time, and high-quality output. However, scaling 3DGS to large (or intricate) scenes is challenging due to its substantial memory requirement, which exceeds the memory capacity of most GPUs. In this paper, we describe CLM, a system that allows 3DGS to render large scenes using a single consumer-grade GPU, e.g., RTX4090. It does so by offloading Gaussians to CPU memory, and loading them into GPU memory only when necessary. To improve performance and reduce communication overheads, CLM uses a novel offloading strategy based on insights into 3DGS's memory access patterns. This strategy enables efficient pipelining, which overlaps GPU-to-CPU communication, GPU computation and CPU computation. Furthermore, CLM exploits these access patterns to reduce communication volume. Our evaluation shows that the resulting implementation can render a large scene that requires 102 million Gaussians on a single RTX4090 and achieve state-of-the-art reconstruction quality. The code is open-sourced at: https://github.com/nyu-systems/CLM-GS
Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication
ArXiv.org · 2026-01-06
articleOpen accessAs Byzantine Fault Tolerant (BFT) protocols begin to be used in permissioned blockchains for user-facing applications such as payments, it is crucial that they provide low latency. In pursuit of low latency, some recently proposed BFT consensus protocols employ a leaderless optimistic fast path, in which clients broadcast their requests directly to replicas without first serializing requests at a leader, resulting in an end-to-end commit latency of 2 message delays ($2Δ$) during fault-free, synchronous periods. However, such a fast path only works if there is no contention: concurrent contending requests can cause replicas to diverge if they receive conflicting requests in different orders, triggering costly recovery procedures. In this work, we present Aspen, a leaderless BFT protocol that achieves a near-optimal latency of $2Δ+ \varepsilon$, where $\varepsilon$ indicates a short waiting delay. Aspen removes the no-contention condition by utilizing a best-effort sequencing layer based on loosely synchronized clocks and network delay estimates. Aspen requires $n = 3f + 2p + 1$ replicas to cope with up to $f$ Byzantine nodes. The $2p$ extra nodes allow Aspen's fast path to proceed even if up to $p$ replicas diverge due to unpredictable network delays. When its optimistic conditions do not hold, Aspen falls back to PBFT-style protocol, guaranteeing safety and liveness under partial synchrony. In experiments with wide-area distributed replicas, Aspen commits requests in less than 75 ms, a 1.2 to 3.3$\times$ improvement compared to previous protocols, while supporting 19,000 requests per second.
Front Matter, Table of Contents, Preface, Conference Organization
Leibniz-Zentrum für Informatik (Schloss Dagstuhl) · 2026-01-01
articleOpen accessSenior authorFront Matter, Table of Contents, Preface, Conference Organization
OASIcs, Volume 139, NINeS 2026, Complete Volume
Leibniz-Zentrum für Informatik (Schloss Dagstuhl) · 2026-01-01
otherOpen accessSenior authorOASIcs, Volume 139, NINeS 2026, Complete Volume
Elastic Scaling of Real-Time Communication Services
IEEE Transactions on Network and Service Management · 2026-01-01
articleReal-time Communications (RTC) services, including multiparty conferencing, live streaming, and cloud-gaming, rely on a large-scale media plane infrastructure that provides real-time audio/video processing to clients. Unfortunately, offthe- shelf RTC services are not elastically scalable. As a result, operators must provision media servers to meet peak demand, resulting in resource under-utilization and high cost. Given that today microservice orchestrators like Kubernetes allow web-services to scale transparently and econimically, this paper looks at applying the same approach to scale large-scale RTC services. We find that this is challenging for two reasons: (a) the default network dataplane underlying Kubernetes does not meet the compelling traffic management, performance and real-time requirements of RTC; and (b) current autoscaling policies are ill-suited to RTC. We address these challenges by designing a RTC-specific service mesh that pushes media traffic processing into the OS kernel and designing new RTC-specific Kubernetes autoscaling policies. Our evaluation on a functional VoIP test-bed shows that this combination allows to deploy elatically scalable RTC services with 100× lower-jitter and 700× lower RTT than the current state-of-the art.
2026-03-10
articleOpen accessSenior authorDistributed machine learning training and inference is common today because today's large models require more memory and compute than can be provided by a single GPU. Distributed models are generally produced by programmers who take a sequential model specification and apply several distribution strategies to distribute state and computation across GPUs. Unfortunately, bugs can be introduced in the process, and a distributed model implementation's outputs might differ from the sequential model's outputs. In this paper, we describe an approach to statically identify such bugs by checking model refinement, that is, can the sequential model's outputs be reconstructed from the distributed model's outputs? Our approach, implemented in Entangle, uses iterative rewriting to prove model refinement. Our approach can scale to today's large models and deployments: we evaluate it using GPT and Llama-3. Further, it provides actionable outputs that aids in bug localization.
On Scaling Up 3D Gaussian Splatting Training
Lecture notes in computer science · 2025-01-01 · 13 citations
book-chapter
Recent grants
EAGER: Collaborative Research: Towards an Extensible Internet
NSF · $30k · 2021–2022
CAREER: Assertions for Distributed Applications
NSF · $700k · 2022–2027
Collaborative Research: PPoSS: Planning: Making Smart Use of SmartNICs
NSF · $80k · 2020–2022
Frequent coauthors
- 158 shared
Scott Shenker
University of California, Berkeley
- 39 shared
Sylvia Ratnasamy
Google (United States)
- 26 shared
James McCauley
Mount Holyoke College
- 23 shared
Mooly Sagiv
- 21 shared
Arvind Krishnamurthy
Stanford University
- 16 shared
Colin Scott
Microsoft Research (India)
- 15 shared
Justine Sherry
Carnegie Mellon University
- 15 shared
Yotam Harchol
Education
Ph.D.
UC Berkeley
Other, Math-CS
Brown
Awards & honors
- HotOS 2023 Best Paper Award
- HotNets 2023 Best Student Paper Award
- HotNets 2022 Best Paper Award
- SIGCOMM 2019 Best Student Paper Award
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Aurojit Panda
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup