Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Cheng Li

Cheng Li

· Assistant Professor of Chinese StudiesVerified

Carnegie Mellon University · Languages, Cultures & Applied Linguistics

Active 1996–2026

h-index44
Citations9.7k
Papers280140 last 5y
Funding
See your match with Cheng Li — sign in to PhdFit.Sign in

About

Cheng Li is an Assistant Professor of Chinese Studies at Carnegie Mellon University, specializing in the literary and cultural history of modern China. He investigates the rise of modern China through cultural and historical perspectives, focusing on areas such as the environment, military, and infrastructure. After receiving his PhD in modern China studies from Yale University in 2022, he joined Carnegie Mellon University as an assistant professor. His primary research engages with modern Chinese environmental literature, also known as ecocriticism, as well as film and history. Additionally, his research interests encompass science fiction, infrastructure studies, and military studies. Cheng Li's scholarly work has been published in several academic journals, and his first book, "Contested Environmentalisms: Trees and the Making of Modern China," was published by Stanford University Press in 2025. The dissertation version of this book earned the Marston Anderson Prize for the best dissertation in the East Asian department at Yale University in 2022.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Political Science
  • Business
  • Geography
  • Medicine
  • Economics
  • Econometrics
  • Operations research
  • Actuarial science
  • Engineering
  • Statistics
  • Epistemology
  • Mathematics
  • Environmental health
  • World Wide Web
  • Meteorology
  • Data science
  • Philosophy

Selected publications

  • TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

    arXiv (Cornell University) · 2026-04-07

    articleOpen access

    We introduce TFRBench, the first benchmark designed to evaluate the reasoning capabilities of forecasting systems. Traditionally, time-series forecasting has been evaluated solely on numerical accuracy, treating foundation models as ``black boxes.'' Unlike existing benchmarks, TFRBench provides a protocol for evaluating the reasoning generated by forecasting systems--specifically their analysis of cross-channel dependencies, trends, and external events. To enable this, we propose a systematic multi-agent framework that utilizes an iterative verification loop to synthesize numerically grounded reasoning traces. Spanning ten datasets across five domains, our evaluation confirms that this reasoning is causally effective; useful for evaluation; and prompting LLMs with our generated traces significantly improves forecasting accuracy compared to direct numerical prediction (e.g., avg. $\sim40.2\%\to56.6\%)$, validating the quality of our reasoning. Conversely, benchmarking experiments reveal that off-the-shelf LLMs consistently struggle with both reasoning (lower LLM-as-a-Judge scores) and numerical forecasting, frequently failing to capture domain-specific dynamics. TFRBench thus establishes a new standard for interpretable, reasoning-based evaluation in time-series forecasting. Our benchmark is available at: https://tfrbench.github.io

  • Reasoning-Aware Training for Time Series Forecasting

    arXiv (Cornell University) · 2026-05-09

    preprintOpen access

    Time Series Foundation Models (TSFMs) excel at numerical forecasting but operate as black boxes lacking qualitative reasoning. Conversely, applying LLMs directly to temporal data introduces a modality gap: text tokenizers fragment continuous numerical values, degrading mathematical relationships and exploding sequence lengths, leading to computational overhead. To resolve this, we introduce STRIDE (Strategic Time-series Reasoning Injected via Distilled Embeddings), a novel framework natively integrating LLM reasoning into the continuous embedding space of TSFMs. Instead of discrete tokens, STRIDE distills reasoning traces into a lightweight LLM, dynamically projecting its mean-pooled hidden states as a cross-modal prior into the target numerical encoder. The architecture is jointly optimized using cross-entropy and quantile losses. Evaluations demonstrate STRIDE establishes state-of-the-art numerical forecasting on GIFT-Eval (0.674 MASE, 0.454 CRPS) compared to TSFMs and exhibits superior in-domain and out-of-domain numerical as well as reasoning performance on TFRBench. Specifically, STRIDE acts as a plug-and-play enhancement, consistently improving diverse TSFMs (e.g., Chronos-2, Timer-S1) across various LLM configurations. Thus, injecting semantic reasoning as a continuous prior equips TSFMs with human-interpretable reasoning while fundamentally improving predictive accuracy.

  • TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

    arXiv (Cornell University) · 2026-04-07

    preprintOpen access

    We introduce TFRBench, the first benchmark designed to evaluate the reasoning capabilities of forecasting systems. Traditionally, time-series forecasting has been evaluated solely on numerical accuracy, treating foundation models as ``black boxes.'' Unlike existing benchmarks, TFRBench provides a protocol for evaluating the reasoning generated by forecasting systems--specifically their analysis of cross-channel dependencies, trends, and external events. To enable this, we propose a systematic multi-agent framework that utilizes an iterative verification loop to synthesize numerically grounded reasoning traces. Spanning ten datasets across five domains, our evaluation confirms that this reasoning is causally effective; useful for evaluation; and prompting LLMs with our generated traces significantly improves forecasting accuracy compared to direct numerical prediction (e.g., avg. $\sim40.2\%\to56.6\%)$, validating the quality of our reasoning. Conversely, benchmarking experiments reveal that off-the-shelf LLMs consistently struggle with both reasoning (lower LLM-as-a-Judge scores) and numerical forecasting, frequently failing to capture domain-specific dynamics. TFRBench thus establishes a new standard for interpretable, reasoning-based evaluation in time-series forecasting. Our benchmark is available at: https://tfrbench.github.io

  • Reasoning-Aware Training for Time Series Forecasting

    ArXiv.org · 2026-05-09

    articleOpen access

    Time Series Foundation Models (TSFMs) excel at numerical forecasting but operate as black boxes lacking qualitative reasoning. Conversely, applying LLMs directly to temporal data introduces a modality gap: text tokenizers fragment continuous numerical values, degrading mathematical relationships and exploding sequence lengths, leading to computational overhead. To resolve this, we introduce STRIDE (Strategic Time-series Reasoning Injected via Distilled Embeddings), a novel framework natively integrating LLM reasoning into the continuous embedding space of TSFMs. Instead of discrete tokens, STRIDE distills reasoning traces into a lightweight LLM, dynamically projecting its mean-pooled hidden states as a cross-modal prior into the target numerical encoder. The architecture is jointly optimized using cross-entropy and quantile losses. Evaluations demonstrate STRIDE establishes state-of-the-art numerical forecasting on GIFT-Eval (0.674 MASE, 0.454 CRPS) compared to TSFMs and exhibits superior in-domain and out-of-domain numerical as well as reasoning performance on TFRBench. Specifically, STRIDE acts as a plug-and-play enhancement, consistently improving diverse TSFMs (e.g., Chronos-2, Timer-S1) across various LLM configurations. Thus, injecting semantic reasoning as a continuous prior equips TSFMs with human-interpretable reasoning while fundamentally improving predictive accuracy.

  • Nexus : An Agentic Framework for Time Series Forecasting

    ArXiv.org · 2026-05-14

    articleOpen access

    Time series forecasting is not just numerical extrapolation, but often requires reasoning with unstructured contextual data such as news or events. While specialized Time Series Foundation Models (TSFMs) excel at forecasting based on numerical patterns, they remain unaware to real-world textual signals. Conversely, while LLMs are emerging as zero-shot forecasters, their performance remains uneven across domains and contextual grounding. To bridge this gap, we introduce Nexus, a multi-agent forecasting framework that decomposes prediction into specialized stages: isolating macro-level and micro-level temporal fluctuations, and integrating contextual information when available before synthesizing a final forecast. This decomposition enables Nexus to adapt from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting. We show that current-generation LLMs possess substantially stronger intrinsic forecasting ability than previously recognized, depending critically on how numerical and contextual reasoning are organized. Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, Nexus consistently matches or outperforms state-of-the-art TSFMs and strong LLM baselines. Beyond numerical accuracy, Nexus produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. Our results establish that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.

  • Nexus : An Agentic Framework for Time Series Forecasting

    arXiv (Cornell University) · 2026-05-14

    preprintOpen access

    Time series forecasting is not just numerical extrapolation, but often requires reasoning with unstructured contextual data such as news or events. While specialized Time Series Foundation Models (TSFMs) excel at forecasting based on numerical patterns, they remain unaware to real-world textual signals. Conversely, while LLMs are emerging as zero-shot forecasters, their performance remains uneven across domains and contextual grounding. To bridge this gap, we introduce Nexus, a multi-agent forecasting framework that decomposes prediction into specialized stages: isolating macro-level and micro-level temporal fluctuations, and integrating contextual information when available before synthesizing a final forecast. This decomposition enables Nexus to adapt from seasonal signals to volatile, event-driven information without relying on external statistical anchors or monolithic prompting. We show that current-generation LLMs possess substantially stronger intrinsic forecasting ability than previously recognized, depending critically on how numerical and contextual reasoning are organized. Evaluated on data strictly succeeding LLM knowledge cutoffs spanning Zillow real estate metrics and volatile stock market equities, Nexus consistently matches or outperforms state-of-the-art TSFMs and strong LLM baselines. Beyond numerical accuracy, Nexus produces high-quality reasoning traces that explicitly show the fundamental drivers behind each forecast. Our results establish that real-world forecasting is an agentic reasoning problem extending well beyond only sequence modeling.

  • LEAF: A Living Benchmark for Event-Augmented Forecasting

    arXiv (Cornell University) · 2026-05-09

    preprintOpen access

    Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either lack the multidimensional events essential for accurate forecasting due to data scarcity, or focus on relatively closed environments. To assess the predictive capabilities of LLMs in complex, real-world scenarios, we propose LEAF, the first living benchmark for event-augmented forecasting tasks, including future event probabilities, trend and time series forecasting. LEAF utilizes a recursive retrieval agent system paired with dual-agent cross-validation to provide comprehensive and relevant auxiliary text for forecasting. Evaluating state-of-the-art proprietary and open-weight LLMs, we find that these models can leverage signals extracted from complex events to enhance predictive performance. In the stock domain, we find that LLMs achieve better performance on equities they confidently identify as more predictable. Furthermore, the events demonstrate a strong correlation with the target equities. To this end, LEAF provides a necessary, dynamically updating testbed to continuously track and drive progress in event-driven forecasting tasks.

  • LEAF: A Living Benchmark for Event-Augmented Forecasting

    ArXiv.org · 2026-05-09

    articleOpen access

    Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability while mitigating pre-training data contamination, several living benchmarks have been proposed. However, existing benchmarks either lack the multidimensional events essential for accurate forecasting due to data scarcity, or focus on relatively closed environments. To assess the predictive capabilities of LLMs in complex, real-world scenarios, we propose LEAF, the first living benchmark for event-augmented forecasting tasks, including future event probabilities, trend and time series forecasting. LEAF utilizes a recursive retrieval agent system paired with dual-agent cross-validation to provide comprehensive and relevant auxiliary text for forecasting. Evaluating state-of-the-art proprietary and open-weight LLMs, we find that these models can leverage signals extracted from complex events to enhance predictive performance. In the stock domain, we find that LLMs achieve better performance on equities they confidently identify as more predictable. Furthermore, the events demonstrate a strong correlation with the target equities. To this end, LEAF provides a necessary, dynamically updating testbed to continuously track and drive progress in event-driven forecasting tasks.

  • Characterization of the Specific Binding Between Aptamers and Cytochrome <i>c</i> With Pressure‐Assisted Capillary Electrophoresis Frontal Analysis

    Electrophoresis · 2025-09-01 · 2 citations

    articleOpen access

    ABSTRACT Cytochrome c (cyt c ) is a heme protein located in the mitochondrial intermembrane space. Because the release of cyt c is a highly specific event in apoptotic signaling, it can serve as an apoptosis‐related marker. To date, three frequently used aptamers for cyt c (Apt40, Apt61, and Apt76) have been selected and applied in the field of sensing. The response of these aptamers is not clear, partly because of their weak affinity and nonspecific binding inherent to the system. In this study, pressure‐assisted capillary electrophoresis frontal analysis (PACE‐FA) was used to characterize the interactions between the aptamers and cyt c , and an electrophoretic mobility‐based correction was introduced to obtain accurate binding constants. A nonlinear curve‐fitting approach was used for evaluating specific binding interactions in the presence of nonspecific binding. Apt76 was found to bind specifically to cyt c , exhibiting the highest binding constant (1.53 × 10 6 M −1 ), and all three aptamers interacted with cyt c at 1:1 stoichiometry. Fluorescence titrations were performed to verify the effectiveness of the reference‐free PACE‐FA method. This study demonstrates that specific binding between biomolecules has different characteristics compared to nonspecific binding and that the PACE‐FA method can be widely used in the evaluation of biological macromolecular interactions.

  • Deciphering acquired resistance mechanisms to sustained auxin-inducible protein degradation in cells and mice

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-23

    preprintOpen accessSenior authorCorresponding

    Abstract Targeted protein degradation is a favorable strategy for studying the immediate downstream effects of protein loss-of-function. An appealing platform among these technologies is the auxin-inducible degron (AID) system. Although this system has been applied extensively to cell and animal models, degradation resistance to long-term auxin treatment has not been studied. With the advent of the new AID2 system, cellular toxicity due to the high concentrations of auxin required in the original AID1 system is no longer a concern, making it possible to study protein degradation over extended periods. In this study, we derived multiple miniAID-tagged knock-in human cell lines and a Ctcf-miniAID knock-in mouse strain to investigate mechanisms of degradation resistance. We revealed four independent resistance mechanisms, including a nonsense mutation in the CTCF coding sequence that removed the miniAID peptide, a missense point mutation in the miniAID coding region that disrupted ubiquitin complex targeting, and silencing of the OsTIR1 adaptor protein. Resistance to auxin degradation was also acquired in mouse primary Ctcf miniAID/miniAID knock-in B-ALL cells through missense mutations of the OsTIR1 (F74G) protein in vivo and ex vivo . In summary, our innovative study expands our understanding of the AID system and cautions careful consideration of design for future applications in mammalian system.

Frequent coauthors

  • Ying Jin

    Affiliated Hospital of Nantong University

    87 shared
  • Junjie Gu

    Sichuan University

    57 shared
  • Yu Ma

    Fudan University

    50 shared
  • Tomas Pfister

    50 shared
  • Barnabás Póczos

    33 shared
  • Ying Yang

    Nanjing University

    29 shared
  • Judith Hyle

    St. Jude Children's Research Hospital

    27 shared
  • Shaela Wright

    St. Jude Children's Research Hospital

    27 shared

Education

  • Ph.D, Developmental Biology

    Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences

    2009
  • Bachelor, Department of Life Sciences

    Central China Normal University

    2004

Awards & honors

  • Marston Anderson Prize for the best dissertation in the East…
  • Falk Research Grant, Carnegie Mellon University (2023)
  • East Asian Prize Fellowship, Yale University (2021-2022)
  • Environmental Humanities Certificate, Yale University (2020)
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Cheng Li

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup