Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Sudarsun Kannan

Sudarsun Kannan

· Associate ProfessorVerified

Rutgers University · Computer Science

Active 2011–2025

h-index13
Citations661
Papers4321 last 5y
Funding$1.1M1 active
See your match with Sudarsun Kannan — sign in to PhdFit.Sign in

About

Sudarsun Kannan is an Associate Professor in the Department of Computer Science at Rutgers University, specializing in Operating Systems and their implications on Computer Architecture, Distributed Systems, and High-performance Computing. His research group focuses on building systems that efficiently manage memory and storage heterogeneity. Professor Kannan's work spans a variety of topics including direct-access and near-storage file systems, memory management and heterogeneity, I/O prefetching and collaborative caching, correctness and reliability of persistent transactions, and datacenter, edge, and hazard-aware systems. He has contributed to the development of innovative systems such as PolyStore, OmniCache, FusionFS, CrossFS, and Trio, among others, which address challenges in storage and memory systems with an emphasis on performance and scalability. In addition to his research, he actively mentors Ph.D. students with interests in operating systems and computer architecture, particularly those with expertise in OS kernel hacking and low-level systems building. Professor Kannan is also involved in teaching advanced courses on operating systems design and theory, and he serves on program committees for leading conferences in his field. His work has been recognized with awards including the NSF CAREER Award, Google Research Scholar Award, Samsung Research Award, and best paper awards at premier conferences.

Research topics

  • Computer Science
  • Operating system
  • Artificial Intelligence
  • Computer hardware
  • Computer network
  • Distributed computing
  • Embedded system
  • Electrical engineering
  • Parallel computing
  • Database
  • Engineering
  • Programming language

Selected publications

  • Paging and the Address-Translation Problem

    ACM Transactions on Algorithms · 2025-06-04

    article

    The classical paging problem, introduced by Sleator and Tarjan in 1985, formalizes the problem of caching pages in RAM in order to minimize IOs. Their online formulation ignores the cost of address translation: Programs refer to data via virtual addresses, and these must be translated into physical locations in RAM. Although the cost of an individual address translation is much smaller than that of an IO, every memory access involves an address translation, whereas IOs can be infrequent. In practice, one can spend money to avoid paging by over-provisioning RAM; in contrast, address translation is effectively unavoidable. Thus address-translation costs can sometimes dominate paging costs, and systems must simultaneously optimize both. To mitigate the cost of address translation, all modern CPUs have translation lookaside buffers (TLBs), which are hardware caches of common address translations. What makes TLBs interesting is that a single TLB entry can potentially encode the address translation for many addresses. This is typically achieved via the use of huge pages, which translate runs of contiguous virtual addresses to runs of contiguous physical addresses. Huge pages reduce TLB misses at the cost of increasing the IOs needed to maintain contiguity in RAM. This tradeoff between TLB misses and IOs suggests that the classical paging problem does not tell the full story. This article introduces the Address-Translation Problem, which formalizes the problem of maintaining a TLB, a page table, and RAM in order to minimize the total cost of both TLB misses and IOs. We present an algorithm that achieves the benefits of huge pages for TLB misses without the downsides of huge pages for IOs.

  • Can a Client-Server Cache Tango Accelerate Disaggregated Storage?

    2025-06-23

    articleOpen accessSenior author

    Disaggregated storage architectures have become a critical component in modern data centers, offering independent scaling of compute and storage. However, disaggregation introduces performance challenges, particularly due to the overhead of remote storage access. We first conduct a detailed end-to-end analysis of existing caching strategies and identify the critical issues, such as high server resource consumption, data duplication, lack of fair server resources scheduling, inefficient eviction, and prefetching policies. To address these limitations, we present the preliminary design of OrcaCache, an orchestrated, unified caching framework that coordinates between clients and storage servers. OrcaCache aims to carefully shift cache indexing to clients by exposing a global cache view, with an aim to reduce server CPU usage and duplication of data across caches. OrcaCache also aims to improve cache efficiency, adaptiveness, and fairness across servers and clients.

  • Analyzing and Enhancing ArckFS: An Anecdotal Example of Benefits of Artifact Evaluation

    2025-10-01

    articleOpen access

    We analyze and enhance Trio and ArckFS by Zhou et al. (SOSP 2023), high-performance NVM file system architecture and file system. A group of authors from KAIST initiated this study through a careful review of the paper and the released artifact, seeking to enhance the Trio work. Their analysis identifies (1) insufficient clarity in the paper on the handling of multi-inode operations, and (2) several implementation bugs in ArckFS that cause occasional operation failures or potential crash inconsistencies during inode creation.

  • Don't Melt Your Cache: Low-Associativity with Heat-Sink

    2025-07-16

    article

    Perhaps the most influential result in the theory of caches is the following theorem due to Sleator and Tarjan: With O(1) resource augmentation, the basic LRU eviction policy is guaranteed to be O(1)-competitive with the optimal offline policy.

  • Stateful Triage for Reliable and Secure Wildfire Monitoring at the Edge

    2025-10-06

    articleSenior author

    Wildfire monitoring is increasingly done at the network edge using small, shared (multi-tenant) sensor nodes with cameras and on-device ML. While this architecture scales, it is also susceptible to new reliability and security risks. In this paper we study how co-located workloads and side channels can undermine wildfire detection on a shared GPU edge node.Using real workloads, we show two key effects. First, inference latency is content-dependent: frames that look like fire or smoke tend to trigger heavier model paths and take longer to process. Second, a concurrent ML application (e.g., an animal detector) can slow the wildfire pipeline and cause instability by competing for VRAM and compute. Taken these effects together, timing and simple GPU telemetry (utilization, memory, power) leak enough signal for a co-tenant to infer when "fire-like" frames are being processed and to time bursts of activity that further delay or drop those frames.To mitigate this problem, we argue that edge deployments should treat stateful triage as a first-class system component. Rather than emitting isolated detections, a lightweight triage layer maintains short event state, enforces basic spatiotemporal consistency (e.g., smoke before flame, persistent growth), and adapts resource use based on both event state and platform health. This paper contributes: (1) an empirical analysis of multi-tenant risks for wildfire ML at the edge, including content-dependent latency and timing side channels; (2) a security-oriented threat model for shared edge nodes; and (3) a design sketch for stateful triage that can reduce false positives, prioritize likely true events, and blunt denial-of-service and timing leaks.

  • Are Edge MicroDCs Equipped to Tackle Memory Contention?

    2025-06-23 · 1 citations

    articleOpen accessSenior author

    Hazard monitoring systems rely on micro datacenters (MicroDCs) for local data processing and real-time response in resource- and energy-constrained environments. These MicroDCs often host diverse, multitenant applications---such as object detection and sensor data ingestion---that contend for shared memory. Through a case study of compute-intensive and I/O-intensive applications, we show that different applications use memory differently (e.g., heap vs. OS-managed page cache), leading to asymmetric performance degradation under memory pressure. Our findings highlight the limitations of existing OS-level resource management approaches and motivate the need for cross-layered coordination between applications and the operating system to treat all memory uses as first-class citizens and adapt to changing workload demands in MicroDCs.

  • PANGOLIN: a Comprehensive Testing Framework for Configuration-Rich Key-Value Stores

    2025-08-28

    article

    In this paper, we present Pangolin, a comprehensive testing framework for configuration-rich key-value stores. To better understand bugs in modern key-value stores and explore domain knowledge for efficiently identifying new ones, we first comprehensively study historical bugs in five mature key-value stores during the last eight years. Then, we design and implement Pangolin, which is motivated by insights from our bug study, which indicated most bugs could be identified by systematically testing a small sequence of operations and configurations. Specifically, Pangolin practices these insights by introducing a bounded testing strategy into a spectrum of black-box and fuzzing test procedures. Finally, we utilize Pangolin to find 20 bugs and reproduce 443 historical bugs in five mature key-value stores (RocksDB, LevelDB, HyperlevelDB, BadgerDB, and Redis), making it an attractive supplement to handwritten test suites.

  • Towards Application Centric Carbon Emission Management

    ACM SIGEnergy Energy Informatics Review · 2024-07-01

    articleOpen access1st authorCorresponding

    Carbon emissions are due to application execution on a target system (operational emissions) and the production, transportation, and disposal of the system itself (embodied emissions). This paper investigates the impacts of different resource configurations in terms of available DRAM memory on the overall carbon emission of individual application executions. We first propose an application-centric carbon footprint model that considers DRAM and CPU. We then study the model using a widely-used key-value store (RocksDB) and Graph500 applications. The results for RocksDB indicate that the minimal emission configuration is application dependent and can lead to significant emission reductions compared to application-oblivious configurations that use higher DRAM capacity without improving performance. For some applications, small performance degradation may lead to substantial additional emission reductions.

  • Mosaic Pages: Big TLB Reach With Small Pages

    IEEE Micro · 2024-06-06 · 4 citations

    article

    This article introduces mosaic pages, which increase translation lookaside buffer (TLB) reach by compressing multiple, discrete translations into one TLB entry. Mosaic leverages virtual contiguity for locality, but does not use physical contiguity. Mosaic relies on recent advances in hashing theory to constrain memory mappings, in order to realize this physical address compression without reducing memory utilization or increasing swapping. Mosaic reduces TLB misses in several workloads by 6%–81%. Our results show that Mosaic ’s constraints on memory mappings do not harm performance, we never see conflicts before memory is 98% full in our experiments—at which point a traditional design would also likely swap. Timing and area analyses on a commercial 28-nm CMOS process indicate that the hashing required on the critical path can run at a maximum frequency of 4 GHz, indicating that a Mosaic TLB is unlikely to affect clock frequency.

  • CrossPrefetch: Accelerating I/O Prefetching for Modern Storage

    2024-04-17 · 8 citations

    articleOpen accessSenior author

    We introduce CrossPrefetch, a novel cross-layered I/O prefetching mechanism that operates across the OS and a user-level runtime to achieve optimal performance. Existing OS prefetching mechanisms suffer from rigid interfaces that do not provide information to applications on the prefetch effectiveness, suffer from high concurrency bottlenecks, and are inefficient in utilizing available system memory. CrossPrefetch addresses these limitations by dividing responsibilities between the OS and runtime, minimizing overhead, and achieving low cache misses, lock contentions, and higher I/O performance.

Recent grants

Frequent coauthors

Labs

Education

  • Ph.D., Computer Science

    Rutgers, The State University of New Jersey

Awards & honors

  • NSF CAREER Award
  • Rutgers University Board of Trustees Research Fellowship for…
  • NSF grant for 'Redesigning I/O Across Heterogeneous Systems'
  • NSF award for 'A Unified Monitoring Approach to Enhancing th…
  • NSF Grant for High Performance File Systems
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Sudarsun Kannan

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup