Aurojit Panda

· Associate Professor in Computer ScienceVerified

New York University · Computer Science

Active 2008–2026

h-index40

Citations5.3k

Papers13751 last 5y

Funding$810k1 active

Faculty page Lab page

See your match with Aurojit Panda — sign in to PhdFit.Sign in

About

Aurojit Panda is an associate professor in Computer Science at New York University (NYU). He earned his PhD from the University of California, Berkeley, where he was advised by Scott Shenker and conducted research in the NetSys Lab. Prior to his doctoral studies, he received a Sc.B. with honors in Math and Computer Science from Brown University. Before joining NYU, Panda worked as a software developer at Nefeli Networks, a startup specializing in network function orchestration solutions. Additionally, he spent several years working on the Midori kernel at Microsoft between his time at Brown and Berkeley. Professor Panda's research focuses on systems and networking problems, with a particular interest in improving system reliability. His work aims to identify bugs before deployment and enhance fault tolerance in systems. He is engaged in a broad range of topics within this domain and is open to collaborating with NYU undergraduates, masters, and PhD students who have relevant coursework or experience in systems or networking. His teaching includes courses such as Distributed Systems and Undergraduate Operating Systems.

Research topics

Computer Science
Operating system
Computer Security
Distributed computing
Software engineering
Embedded system
Programming language
World Wide Web

Selected publications

Probabilistic Fair Ordering of Events
Open MIND · 2026-02-09
preprint
A growing class of applications depends on fair ordering, where events that occur earlier should be processed before later ones. Providing such guarantees is difficult in practice because clock synchronization is inherently imperfect: events generated at different clients within a short time window may carry timestamps that cannot be reliably ordered. Rather than attempting to eliminate synchronization error, we embrace it and establish a probabilistically fair sequencing process. Tommy is a sequencer that uses a statistical model of per-clock synchronization error to compare noisy timestamps probabilistically. Although this enables ordering of two events, the probabilistic comparator is intransitive, making global ordering non-trivial. We address this challenge by mapping the sequencing problem to a classical ranking problem from social choice theory, which offers principled mechanisms for reasoning with intransitive comparisons. Using this formulation, Tommy produces a partial order of events, achieving significantly better fairness than a Spanner TrueTime-based baseline approach.
DOI
Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication
arXiv (Cornell University) · 2026-01-06
preprintOpen access
As Byzantine Fault Tolerant (BFT) protocols begin to be used in permissioned blockchains for user-facing applications such as payments, it is crucial that they provide low latency. In pursuit of low latency, some recently proposed BFT consensus protocols employ a leaderless optimistic fast path, in which clients broadcast their requests directly to replicas without first serializing requests at a leader, resulting in an end-to-end commit latency of 2 message delays ($2Δ$) during fault-free, synchronous periods. However, such a fast path only works if there is no contention: concurrent contending requests can cause replicas to diverge if they receive conflicting requests in different orders, triggering costly recovery procedures. In this work, we present Aspen, a leaderless BFT protocol that achieves a near-optimal latency of $2Δ+ \varepsilon$, where $\varepsilon$ indicates a short waiting delay. Aspen removes the no-contention condition by utilizing a best-effort sequencing layer based on loosely synchronized clocks and network delay estimates. Aspen requires $n = 3f + 2p + 1$ replicas to cope with up to $f$ Byzantine nodes. The $2p$ extra nodes allow Aspen's fast path to proceed even if up to $p$ replicas diverge due to unpredictable network delays. When its optimistic conditions do not hold, Aspen falls back to PBFT-style protocol, guaranteeing safety and liveness under partial synchrony. In experiments with wide-area distributed replicas, Aspen commits requests in less than 75 ms, a 1.2 to 3.3$\times$ improvement compared to previous protocols, while supporting 19,000 requests per second.
Publisher DOI
Probabilistic Fair Ordering of Events
ArXiv.org · 2026-02-09
articleOpen access
A growing class of applications depends on fair ordering, where events that occur earlier should be processed before later ones. Providing such guarantees is difficult in practice because clock synchronization is inherently imperfect: events generated at different clients within a short time window may carry timestamps that cannot be reliably ordered. Rather than attempting to eliminate synchronization error, we embrace it and establish a probabilistically fair sequencing process. Tommy is a sequencer that uses a statistical model of per-clock synchronization error to compare noisy timestamps probabilistically. Although this enables ordering of two events, the probabilistic comparator is intransitive, making global ordering non-trivial. We address this challenge by mapping the sequencing problem to a classical ranking problem from social choice theory, which offers principled mechanisms for reasoning with intransitive comparisons. Using this formulation, Tommy produces a partial order of events, achieving significantly better fairness than a Spanner TrueTime-based baseline approach.
Publisher OA PDF
CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting
2026-03-10 · 1 citations
articleOpen accessSenior author
3D Gaussian Splatting (3DGS) is an increasingly popular novel view synthesis approach due to its fast rendering time, and high-quality output. However, scaling 3DGS to large (or intricate) scenes is challenging due to its substantial memory requirement, which exceeds the memory capacity of most GPUs. In this paper, we describe CLM, a system that allows 3DGS to render large scenes using a single consumer-grade GPU, e.g., RTX4090. It does so by offloading Gaussians to CPU memory, and loading them into GPU memory only when necessary. To improve performance and reduce communication overheads, CLM uses a novel offloading strategy based on insights into 3DGS's memory access patterns. This strategy enables efficient pipelining, which overlaps GPU-to-CPU communication, GPU computation and CPU computation. Furthermore, CLM exploits these access patterns to reduce communication volume. Our evaluation shows that the resulting implementation can render a large scene that requires 102 million Gaussians on a single RTX4090 and achieve state-of-the-art reconstruction quality. The code is open-sourced at: https://github.com/nyu-systems/CLM-GS
Publisher DOI
Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication
ArXiv.org · 2026-01-06
articleOpen access
As Byzantine Fault Tolerant (BFT) protocols begin to be used in permissioned blockchains for user-facing applications such as payments, it is crucial that they provide low latency. In pursuit of low latency, some recently proposed BFT consensus protocols employ a leaderless optimistic fast path, in which clients broadcast their requests directly to replicas without first serializing requests at a leader, resulting in an end-to-end commit latency of 2 message delays ($2Δ$) during fault-free, synchronous periods. However, such a fast path only works if there is no contention: concurrent contending requests can cause replicas to diverge if they receive conflicting requests in different orders, triggering costly recovery procedures. In this work, we present Aspen, a leaderless BFT protocol that achieves a near-optimal latency of $2Δ+ \varepsilon$, where $\varepsilon$ indicates a short waiting delay. Aspen removes the no-contention condition by utilizing a best-effort sequencing layer based on loosely synchronized clocks and network delay estimates. Aspen requires $n = 3f + 2p + 1$ replicas to cope with up to $f$ Byzantine nodes. The $2p$ extra nodes allow Aspen's fast path to proceed even if up to $p$ replicas diverge due to unpredictable network delays. When its optimistic conditions do not hold, Aspen falls back to PBFT-style protocol, guaranteeing safety and liveness under partial synchrony. In experiments with wide-area distributed replicas, Aspen commits requests in less than 75 ms, a 1.2 to 3.3$\times$ improvement compared to previous protocols, while supporting 19,000 requests per second.
Publisher OA PDF
Front Matter, Table of Contents, Preface, Conference Organization
Leibniz-Zentrum für Informatik (Schloss Dagstuhl) · 2026-01-01
articleOpen accessSenior author
Front Matter, Table of Contents, Preface, Conference Organization
Publisher OA PDF DOI
OASIcs, Volume 139, NINeS 2026, Complete Volume
Leibniz-Zentrum für Informatik (Schloss Dagstuhl) · 2026-01-01
otherOpen accessSenior author
OASIcs, Volume 139, NINeS 2026, Complete Volume
Publisher DOI
Elastic Scaling of Real-Time Communication Services
IEEE Transactions on Network and Service Management · 2026-01-01
article
Real-time Communications (RTC) services, including multiparty conferencing, live streaming, and cloud-gaming, rely on a large-scale media plane infrastructure that provides real-time audio/video processing to clients. Unfortunately, offthe- shelf RTC services are not elastically scalable. As a result, operators must provision media servers to meet peak demand, resulting in resource under-utilization and high cost. Given that today microservice orchestrators like Kubernetes allow web-services to scale transparently and econimically, this paper looks at applying the same approach to scale large-scale RTC services. We find that this is challenging for two reasons: (a) the default network dataplane underlying Kubernetes does not meet the compelling traffic management, performance and real-time requirements of RTC; and (b) current autoscaling policies are ill-suited to RTC. We address these challenges by designing a RTC-specific service mesh that pushes media traffic processing into the OS kernel and designing new RTC-specific Kubernetes autoscaling policies. Our evaluation on a functional VoIP test-bed shows that this combination allows to deploy elatically scalable RTC services with 100× lower-jitter and 700× lower RTT than the current state-of-the art.
Publisher DOI
It Takes Two to Entangle
2026-03-10
articleOpen accessSenior author
Distributed machine learning training and inference is common today because today's large models require more memory and compute than can be provided by a single GPU. Distributed models are generally produced by programmers who take a sequential model specification and apply several distribution strategies to distribute state and computation across GPUs. Unfortunately, bugs can be introduced in the process, and a distributed model implementation's outputs might differ from the sequential model's outputs. In this paper, we describe an approach to statically identify such bugs by checking model refinement, that is, can the sequential model's outputs be reconstructed from the distributed model's outputs? Our approach, implemented in Entangle, uses iterative rewriting to prove model refinement. Our approach can scale to today's large models and deployments: we evaluate it using GPT and Llama-3. Further, it provides actionable outputs that aids in bug localization.
Publisher DOI
On Scaling Up 3D Gaussian Splatting Training
Lecture notes in computer science · 2025-01-01 · 13 citations
book-chapter
Publisher DOI

Recent grants

EAGER: Collaborative Research: Towards an Extensible Internet
NSF · $30k · 2021–2022
CAREER: Assertions for Distributed Applications
NSF · $700k · 2022–2027
Collaborative Research: PPoSS: Planning: Making Smart Use of SmartNICs
NSF · $80k · 2020–2022

Frequent coauthors

Scott Shenker
University of California, Berkeley
158 shared
Sylvia Ratnasamy
Google (United States)
39 shared
James McCauley
Mount Holyoke College
26 shared
Mooly Sagiv
23 shared
Arvind Krishnamurthy
Stanford University
21 shared
Colin Scott
Microsoft Research (India)
16 shared
Justine Sherry
Carnegie Mellon University
15 shared
Yotam Harchol
15 shared

Education

Ph.D.
UC Berkeley
Other, Math-CS
Brown

Awards & honors

HotOS 2023 Best Paper Award
HotNets 2023 Best Student Paper Award
HotNets 2022 Best Paper Award
SIGCOMM 2019 Best Student Paper Award

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Aurojit Panda

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you