Huaicheng Li

· Assistant ProfessorVerified

Virginia Tech · Computer Science

Active 2016–2026

h-index15

Citations869

Papers3627 last 5y

Funding—

Faculty page Lab page

See your match with Huaicheng Li — sign in to PhdFit.Sign in

About

Huaicheng Li is an Assistant Professor in the Department of Computer Science at Virginia Tech. He holds a Ph.D. in computer science from the University of Chicago, obtained in 2020, and an M.S. in computer science from the same institution, earned in 2018. He completed his B.S. in computer science and technology at Wuhan University, China, in 2013. His research interests include operating systems, storage systems, memory systems, and systems architecture. He is based at the Gilbert Place location in Blacksburg, VA, and can be contacted via email at huaicheng@cs.vt.edu or by phone at (540) 231-4482.

Research topics

Computer Science
Operating system
Embedded system
Artificial Intelligence
Computer hardware
Telecommunications
Distributed computing

Selected publications

The impact of the three rights separation of rural homestead reform on farmers’ economic welfare: a county-level macro analysis
Frontiers in Sustainable Food Systems · 2026-04-10
articleOpen accessSenior author
The three rights separation of rural homestead reform (TRSRH) is a key component of China’s ongoing innovation in rural land institutions. Its central aim is to optimize the bundle of rights associated with rural homesteads through the “separation of three rights,” thereby improving the efficiency of land resource allocation and enhancing farmers’ welfare. Using panel data for 2,545 counties from 2000 to 2022, this study employs a staggered difference-in-differences model to systematically evaluate the impact of the reform pilots on farmers’ economic well-being. The results show that the reform significantly increases rural residents’ per capita disposable income, suggesting its positive role in expanding property-based income, facilitating factor mobility, and strengthening institutional guarantees. Further analysis indicates that the reform’s effect is more pronounced in regions with higher levels of economic development, more active population mobility, or stronger locational advantages, and that low-income and low-welfare groups benefit to a greater extent. Mechanism tests reveal that the expansion of non-farm employment opportunities and improvements in infrastructure and public service provision are important channels through which the reform enhances farmers’ welfare. These findings provide useful policy implications for refining the rural homestead system and advancing rural revitalization.
Publisher OA PDF DOI
Performance Predictability in Heterogeneous Memory
2026-03-10
articleOpen accessSenior author
Heterogeneous memory combining DRAM and CXL exhibits variable performance, yet existing metrics correlate weakly with actual slowdown. We present CAMP, a principled framework for predicting CXL-induced slowdown. Our key insight is that a DRAM run (plus a CXL run for bandwidth-bound workloads) exposes the causal microarchitectural pressure points where CXL latency translates into additional processor stall cycles. CAMP captures these signals using 12 performance counters to analytically decompose slowdown into three orthogonal components: demand reads, cache/prefetching, and stores. CAMP also introduces a closed-form model for software-based weighted interleaving that predicts performance across DRAM--CXL ratios. Across 265 workloads on NUMA and three CXL devices, CAMP achieves 91--97% prediction accuracy within 10% absolute error. We demonstrate that these models enable practical system policies, including ''Best-shot'' interleaving and colocated workload placement, improving performance by up to 21% and 23% over existing tiering and colocation approaches.
Publisher DOI
Carbon trading system and low-carbon economic efficiency in China: considering the roles of international capital flows and regional coopetition
Environment Development and Sustainability · 2026-03-15
articleSenior authorCorresponding
Publisher DOI
PACT: A Criticality-First Design for Tiered Memory
2026-03-10
articleOpen accessSenior author
Tiered memory systems typically place pages based on access frequency (hotness), yet frequency alone fails to capture the true performance impact. We present PACT, an online, page-granular tiered memory design that elevates performance criticality to a first-class design principle. At its core is Per-page Access Criticality (PAC), a fine-grained metric that quantifies each page's contribution to application performance rather than merely counting accesses. PACT profiles PAC online using a lightweight analytical model that uniquely decomposes per-tier memory-level parallelism via hardware queue occupancy counters, enabling direct CPU stall attribution to individual pages. To handle highly skewed PAC distributions, PACT employs PAC-centric migration policies: eager demotion and adaptive promotion, to dynamically place performance-critical pages in DRAM. Across 13 workloads, PACT achieves up to 61% performance improvement over the best of 7 state-of-the-art tiering designs with up to 50× fewer migrations.
Publisher DOI
Systematic CXL Memory Characterization and Performance Analysis at Scale
2025-03-27 · 26 citations
articleOpen accessSenior author
Compute Express Link (CXL) has emerged as a pivotal interconnect for memory expansion. Despite its potential, the performance implications of CXL across devices, latency regimes, processors, and workloads remain underexplored. We present Melody, a framework for systematic characterization and analysis of CXL memory performance. Melody builds on an extensive evaluation spanning 265 workloads, 4 real CXL devices, 7 latency levels, and 5 CPU platforms. Melody yields many insights: workload sensitivity to sub-μs CXL latencies (140-410ns), the first disclosure of CXL tail latencies, CPU tolerance to CXL latencies, a novel approach (SPA) for pinpointing CXL bottlenecks, and CPU prefetcher inefficiencies under CXL.
Publisher DOI
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
2025-02-28 · 4 citations
articleOpen accessSenior author
Cloud service providers heavily colocate high-priority, latency sensitive (LS), and low-priority, best-effort (BE) DNN inference services on the same GPU to improve resource utilization in data centers. Among the critical shared GPU resources, there has been very limited analysis on the dynamic allocation of compute units and VRAM bandwidth, mainly for two reasons: (1) The native GPU resource management solutions are either hardware-specific, or unable to dynamically allocate resources to different tenants, or both; (2) NVIDIA doesn't expose interfaces for VRAM bandwidth allocation, and the software stack and VRAM channel architectures are black-box, both of which limit the software-level resource management. These drive prior work to design either conservative sharing policies detrimental to throughput, or static resource partitioning only applicable to a few GPU models.
Publisher OA PDF DOI
New media environment, green technological innovation and corporate productivity: Evidence from listed companies in China
Energy Economics · 2024-02-07 · 55 citations
articleSenior author
Publisher DOI
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
arXiv (Cornell University) · 2024-07-19 · 2 citations
preprintOpen access
Cloud service providers heavily colocate high-priority, latency-sensitive (LS), and low-priority, best-effort (BE) DNN inference services on the same GPU to improve resource utilization in data centers. Among the critical shared GPU resources, there has been very limited analysis on the dynamic allocation of compute units and VRAM bandwidth, mainly for two reasons: (1) The native GPU resource management solutions are either hardware-specific, or unable to dynamically allocate resources to different tenants, or both; (2) NVIDIA doesn't expose interfaces for VRAM bandwidth allocation, and the software stack and VRAM channel architectures are black-box, both of which limit the software-level resource management. These drive prior work to design either conservative sharing policies detrimental to throughput, or static resource partitioning only applicable to a few GPU models. To bridge this gap, this paper proposes SGDRC, a fully software-defined dynamic VRAM bandwidth and compute unit management solution for concurrent DNN inference services. SGDRC aims at guaranteeing service quality, maximizing the overall throughput, and providing general applicability to NVIDIA GPUs. SGDRC first reveals a general VRAM channel hash mapping architecture of NVIDIA GPUs through comprehensive reverse engineering and eliminates VRAM channel conflicts using software-level cache coloring. SGDRC applies bimodal tensors and tidal SM masking to dynamically allocate VRAM bandwidth and compute units, and guides the allocation of resources based on offline profiling. We evaluate 11 mainstream DNNs with real-world workloads on two NVIDIA GPUs. The results show that compared with the state-of-the-art GPU sharing solutions, SGDRC achieves the highest SLO attainment rates (99.0% on average), and improves overall throughput by up to 1.47x and BE job throughput by up to 2.36x.
Publisher OA PDF DOI
Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory
arXiv (Cornell University) · 2024-10-01
preprintOpen access
Tiered memory, built upon a combination of fast memory and slow memory, provides a cost-effective solution to meet ever-increasing requirements from emerging applications for large memory capacity. Reducing the size of fast memory is valuable to improve memory utilization in production and reduce production costs because fast memory tends to be expensive. However, deciding the fast memory size is challenging because there is a complex interplay between application characterization and the overhead of page migration used to mitigate the impact of limited fast memory capacity. In this paper, we introduce a system, Tuna, to decide fast memory size based on modeling of page migration. Tuna uses micro-benchmarking to model the impact of page migration on application performance using three metrics. Tuna decides the fast memory size based on offline modeling results and limited information on workload telemetry. Evaluating with common big-memory applications and using 5% as the performance loss target, we show that Tuna in combination with a page management system (TPP) saves fast memory by 8.5% on average (up to 16%). This is in contrast to the 5% saving in fast memory reported by Microsoft Pond for the same workloads (BFS and SSSP) and the same performance loss target.
Publisher OA PDF DOI
Oxygen vacancies nanoarchitectonics in BiVO4/WO3 heterostructured photoanode for effective berberine wastewater purification and electricity generation
Journal of the Taiwan Institute of Chemical Engineers · 2024-04-30 · 10 citations
article
Publisher DOI

Frequent coauthors

Haryadi S. Gunawi
University of Chicago
16 shared
Mingzhe Hao
China Three Gorges University
13 shared
Xing Lin
NetApp (United States)
8 shared
Mark D. Hill
Microsoft (United States)
7 shared
Daniel S. Berger
7 shared
Andrew Baptist
University of Utah
6 shared
Ricardo Bianchini
6 shared
Stanko Novaković
5 shared

Education

PhD, Computer Science
University of Chicago
2020

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Huaicheng Li

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you