Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Yue Cheng

Yue Cheng

· Assistant Professor, Computer Science Assistant Professor, Data ScienceVerified

University of Virginia · Computer Science

Active 1999–2026

h-index20
Citations1.2k
Papers11470 last 5y
Funding$1.3M1 active
See your match with Yue Cheng — sign in to PhdFit.Sign in

About

Yue Cheng is an Assistant Professor at the University of Virginia, holding a dual appointment in the School of Data Science and the Department of Computer Science. Prior to joining UVA in 2022, he served as an Assistant Professor of Computer Science at George Mason University. His research interests include distributed systems, cloud and serverless computing, high-performance computing, and operating systems. Cheng's work is driven by the complexities of modern data-intensive computer systems and aims to develop more efficient and user-friendly approaches to manage these complexities. His current research focuses on designing efficient data systems for data science, including the development of efficient stateful serverless computing systems through a full-stack approach that spans applications, platforms, and hardware, as well as building improved computing and storage systems for distributed machine learning.

Research topics

  • Computer Science
  • Computer network
  • Distributed computing
  • Artificial Intelligence
  • Machine Learning
  • Computer Security
  • Embedded system

Selected publications

  • Dual-axis myelination covariance drives the functional connectivity emergence during infancy

    Nature Communications · 2026-03-19

    articleOpen access

    The mechanisms linking structural maturation to the emergence of functional networks in the perinatal brain remain unresolved. While prevailing models attribute functional connectivity to white matter myelination, neonates paradoxically exhibit adult-like resting-state networks despite profoundly immature white matter tracts. Here, we proposed gray matter myelination covariance as a critical basis of early functional connectivity emergence. We introduced a dual-axis myelination covariance framework and derived a myelination-function coupling (MFC) index specific to the newborn brain. Results revealed that the MFC exhibited distinct spatial patterns dominated by primary sensory and motor cortices, increased with age, and showed a distance-dependent strength. Crucially, neonatal MFC patterns showed a strong spatial correlation with gene expression profiles implicated in neurovascular coupling and specifically predicted later behaviors. These findings suggest that during infancy, the integration of brain function is not initially dominated by only the white matter connections but is also shaped by the synchrony of intracortical microstructure that reflects shared developmental trajectories, which offers a framework for understanding the formation of the developmental connectome.

  • λScale: Enabling Fast Scaling for Serverless Large Language Model Inference

    ArXiv.org · 2025-02-14

    preprintOpen access

    Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often face substantial model startup overhead. This poses a significant challenge in efficiently scaling model instances to accommodate dynamic, bursty workloads commonly observed in real-world inference services. In this paper, we introduce λScale, an efficient serverless inference system to achieve fast model scaling. The key idea behind λScale is to leverage high-speed RDMA networks between GPU nodes for fast model multicast, while enabling distributed inference execution during model transmission -- referred to as "execute-while-load". λScale proposes an efficient model scaling scheme, λPipe, which supports adaptive model multicast and dynamically constructs execution pipelines across receiving nodes for collaborative, distributed inference. Additionally, λScale supports efficient model management across GPU and host memory, allowing fast scaling for models across different storage tiers. Evaluation results show that λScale enables fast model scaling and effectively handles load spikes, achieving up to 5x tail-latency improvement and 31.3% cost reduction compared to state-of-the-art solutions on real-world LLM inference traces.

  • ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

    ArXiv.org · 2025-05-18

    preprintOpen accessSenior author

    Fine-tuning large language models (LLMs) often exceeds GPU memory limits, prompting systems to offload model states to CPU memory. However, existing offloaded training frameworks like ZeRO-Offload treat all parameters equally and update the full model on the CPU, causing severe GPU stalls, where fast, expensive GPUs sit idle waiting for slow CPU updates and limited-bandwidth PCIe transfers. We present ZenFlow, a new offloading framework that prioritizes important parameters and decouples updates between GPU and CPU. ZenFlow performs in-place updates of important gradients on GPU, while asynchronously offloading and accumulating less important ones on CPU, fully overlapping CPU work with GPU computation. To scale across GPUs, ZenFlow introduces a lightweight gradient selection method that exploits a novel spatial and temporal locality property of important gradients, avoiding costly global synchronization. ZenFlow achieves up to 5x end-to-end speedup, 2x lower PCIe traffic, and reduces GPU stalls by over 85 percent, all while preserving accuracy.

  • NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs

    arXiv (Cornell University) · 2025-03-26

    preprintOpen accessSenior author

    Interactive notebook programming is universal in modern ML and AI workflows, with interactive deep learning training (IDLT) emerging as a dominant use case. To ensure responsiveness, platforms like Jupyter and Colab reserve GPUs for long-running notebook sessions, despite their intermittent and sporadic GPU usage, leading to extremely low GPU utilization and prohibitively high costs. In this paper, we introduce NotebookOS, a GPU-efficient notebook platform tailored for the unique requirements of IDLT. NotebookOS employs replicated notebook kernels with Raft-synchronized replicas distributed across GPU servers. To optimize GPU utilization, NotebookOS oversubscribes server resources, leveraging high interarrival times in IDLT workloads, and allocates GPUs only during active cell execution. It also supports replica migration and automatic cluster scaling under high load. Altogether, this design enables interactive training with minimal delay. In evaluation on production workloads, NotebookOS saved over 1,187 GPU hours in 17.5 hours of real-world IDLT, while significantly improving interactivity.

  • Centralization in the Decentralized Web: Challenges and Opportunities in IPFS Data Management

    2025-04-22 · 6 citations

    articleOpen access

    The InterPlanetary File System (IPFS) is a pioneering effort for Web 3.0, well-known for its decentralized infrastructure. However, some recent studies have shown that IPFS exhibits a high degree of centralization and has integrated centralized components for improved performance. While this change contradicts the core decentralized ethos of IPFS and introduces risks of hurting the data replication level and thus availability, it also opens some opportunities for better data management and cost savings through deduplication.

  • Fine-tuning and electronic modulation of AuPdCu nanoflowers assembled with nanowires for robust ethanol oxidation reaction performance

    Nanoscale · 2025-12-15

    article

    CO* intermediate on AuPdCu NPs is enhanced, thereby promoting the EOR process along the C1 pathway. This ternary metal fine-tuning alloying approach presents a viable route for fabricating highly active and durable EOR materials.

  • Strong electronic interactions stem from lattice strain control in PdSnCu nanochains for robust electrocatalytic ethanol oxidation

    Materials Research Bulletin · 2025-11-25 · 1 citations

    article
  • The Decentralization Dilemma: Performance Trade-Offs in IPFS and Breakpoints

    2025-10-28

    articleOpen access

    Web 3.0 is redefining the current Web (Web 2.0) with a focus on data and governance decentralization. The InterPlanetary File System (IPFS) exemplifies this shift. However, it faces a trade-off between decentralization and performance: prior studies have shown IPFS's performance degradations but fail to diagnose root causes or deliver actionable fixes.

  • NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs

    2025-12-11

    articleOpen accessSenior author

    Interactive notebook programming is universal in modern ML and AI workflows, with interactive deep learning training (IDLT) emerging as a dominant use case.To ensure responsiveness, platforms like Jupyter and Colab reserve GPUs for long-running notebook sessions, despite their intermittent and sporadic GPU usage, leading to extremely low GPU utilization and prohibitively high costs.In this paper, we introduce NotebookOS, a GPU-efficient notebook platform tailored for the unique requirements of IDLT.NotebookOS employs replicated notebook kernels with Raft-synchronized replicas distributed across GPU servers.To optimize GPU utilization, NotebookOS oversubscribes server resources, leveraging high inter-arrival times in IDLT workloads, and allocates GPUs only during active cell execution.It also supports replica migration and automatic cluster scaling under high load.Altogether, this design enables interactive training with minimal delay.In evaluation on production workloads, NotebookOS saved over 1,187 GPU hours in 17.5 hours of real-world IDLT, while significantly improving interactivity.

  • ZipLLM: Efficient LLM Storage via Model-Aware Synergistic Data Deduplication and Compression

    arXiv (Cornell University) · 2025-04-30

    preprintOpen accessSenior author

    Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques -- such as deduplication and compression -- are either LLM-oblivious or not compatible with each other, limiting data reduction effectiveness. Our large-scale characterization study across all publicly available Hugging Face LLM repositories reveals several key insights: (1) fine-tuned models within the same family exhibit highly structured, sparse parameter differences suitable for delta compression; (2) bitwise similarity enables LLM family clustering; and (3) tensor-level deduplication is better aligned with model storage workloads, achieving high data reduction with low metadata overhead. Building on these insights, we design BitX, an effective, fast, lossless delta compression algorithm that compresses XORed difference between fine-tuned and base LLMs. We build ZipLLM, a model storage reduction pipeline that unifies tensor-level deduplication and lossless BitX compression. By synergizing deduplication and compression around LLM family clustering, ZipLLM reduces model storage consumption by 54%, over 20% higher than state-of-the-art deduplication and compression approaches.

Recent grants

Frequent coauthors

  • Ali Anwar

    37 shared
  • Ali R. Butt

    Virginia Tech

    32 shared
  • Gaoyan Zhang

    Tianjin University

    16 shared
  • Lixiang Huang

    Fujian Women and Children Hospital

    16 shared
  • Xiaodong Zhang

    University of Electronic Science and Technology of China

    16 shared
  • Jia-Min Zhou

    Tianjin Medical University

    16 shared
  • Shen Wen

    Tianjin First Center Hospital

    16 shared
  • Yuexuan Li

    University of Minnesota Medical Center

    16 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Yue Cheng

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup