Yiyun Li

· Robert F. Goheen Professor in the Humanities; Professor of Creative WritingVerified

Princeton University · Theatre

Active 2007–2026

h-index37

Citations7.3k

Papers19994 last 5y

Funding—

Faculty page

See your match with Yiyun Li — sign in to PhdFit.Sign in

About

Yiyun Li is the Robert F. Goheen Professor in the Humanities and a Professor of Creative Writing at Princeton University. She is an acclaimed author whose works include the memoir Things in Nature Merely Grow, which won the 2026 Pulitzer Prize for memoir, as well as other notable books such as Wednesday's Child, The Book of Goose, Where Reasons End, Dear Friend, from My Life I Write to You in Your Life, and Tolstoy Together, 85 Days of War and Peace with Yiyun Li. Her work has been translated into more than twenty languages and has received numerous honors and awards, including a MacArthur Foundation Fellowship, a Guggenheim Fellowship, a Windham Campbell Prize, the Andrew Carnegie Medal, and the International Writer Award from the Royal Society of Literature. She is a member of both the American Academy of Arts and Sciences and the American Academy of Arts and Letters, and her literary contributions extend into film, with her short story adapted into the award-winning film A Thousand Years of Good Prayers. Li's research and creative work focus on contemporary fiction, literary storytelling, and the exploration of human experiences through her writing.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Mathematical optimization
Mathematics
Parallel computing
Mathematical analysis
Applied mathematics
Statistics
Algorithm
Combinatorics
Programming language
Chemistry

Selected publications

Expected Shortfall Regression via Optimization
arXiv (Cornell University) · 2026-02-21
preprintOpen access1st authorCorresponding
To provide a comprehensive summary of the tail distribution, the expected shortfall is defined as the average over the tail above (or below) a certain quantile of the distribution. The expected shortfall regression captures the heterogeneous covariate-response relationship and describes the covariate effects on the tail of the response distribution. Based on a critical observation that the superquantile regression from the operations research literature does not coincide with the expected shortfall regression, we propose and validate a novel optimization-based approach for the linear expected shortfall regression, without additional assumptions on the conditional quantile models. While the proposed loss function is implicitly defined, we provide a prototype implementation of the proposed approach with some initial expected shortfall estimators based on binning techniques. With practically feasible initial estimators, we establish the consistency and the asymptotic normality of the proposed estimator. The proposed approach achieves heterogeneity-adaptive weights and therefore often offers efficiency gain over existing linear expected shortfall regression approaches in the literature, as demonstrated through simulation studies.
Publisher DOI
Expected Shortfall Regression via Optimization
arXiv (Cornell University) · 2026-01-01
articleOpen access1st authorCorresponding
To provide a comprehensive summary of the tail distribution, the expected shortfall is defined as the average over the tail above (or below) a certain quantile of the distribution. The expected shortfall regression captures the heterogeneous covariate-response relationship and describes the covariate effects on the tail of the response distribution. Based on a critical observation that the superquantile regression from the operations research literature does not coincide with the expected shortfall regression, we propose and validate a novel optimization-based approach for the linear expected shortfall regression, without additional assumptions on the conditional quantile models. While the proposed loss function is implicitly defined, we provide a prototype implementation of the proposed approach with some initial expected shortfall estimators based on binning techniques. With practically feasible initial estimators, we establish the consistency and the asymptotic normality of the proposed estimator. The proposed approach achieves heterogeneity-adaptive weights and therefore often offers efficiency gain over existing linear expected shortfall regression approaches in the literature, as demonstrated through simulation studies.
Publisher OA PDF
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
SSRN Electronic Journal · 2025-01-01 · 1 citations
preprintOpen accessSenior author
Publisher DOI
Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
ArXiv.org · 2025-07-17
preprintOpen accessSenior author
The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.
Publisher OA PDF DOI
LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch
arXiv (Cornell University) · 2025-01-13
preprintOpen access
We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest LLMs trained?" remains unclear within the community. The implementation details for such high-capacity models are often protected due to business considerations associated with their high cost. This lack of transparency prevents LLM researchers from leveraging valuable insights from prior experience, e.g., "What are the best practices for addressing loss spikes?" The LLM360 K2 project addresses this gap by providing full transparency and access to resources accumulated during the training of LLMs at the largest scale. This report highlights key elements of the K2 project, including our first model, K2 DIAMOND, a 65 billion-parameter LLM that surpasses LLaMA-65B and rivals LLaMA2-70B, while requiring fewer FLOPs and tokens. We detail the implementation steps and present a longitudinal analysis of K2 DIAMOND's capabilities throughout its training process. We also outline ongoing projects such as TXT360, setting the stage for future models in the series. By offering previously unavailable resources, the K2 project also resonates with the 360-degree OPEN SOURCE principles of transparency, reproducibility, and accessibility, which we believe are vital in the era of resource-intensive AI research.
Publisher OA PDF DOI
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
SSRN Electronic Journal · 2025-01-01 · 3 citations
articleOpen access
Publisher DOI
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
SSRN Electronic Journal · 2025-01-01 · 9 citations
articleOpen accessSenior author
Publisher DOI
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
SSRN Electronic Journal · 2025-01-01 · 6 citations
articleOpen accessSenior author
Publisher DOI
$\mathtt{M^3VIR}$: A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation
ArXiv.org · 2025-09-21
preprintOpen access1st authorCorresponding
The gaming and entertainment industry is rapidly evolving, driven by immersive experiences and the integration of generative AI (GAI) technologies. Training such models effectively requires large-scale datasets that capture the diversity and context of gaming environments. However, existing datasets are often limited to specific domains or rely on artificial degradations, which do not accurately capture the unique characteristics of gaming content. Moreover, benchmarks for controllable video generation remain absent. To address these limitations, we introduce $\mathtt{M^3VIR}$, a large-scale, multi-modal, multi-view dataset specifically designed to overcome the shortcomings of current resources. Unlike existing datasets, $\mathtt{M^3VIR}$ provides diverse, high-fidelity gaming content rendered with Unreal Engine 5, offering authentic ground-truth LR-HR paired and multi-view frames across 80 scenes in 8 categories. It includes $\mathtt{M^3VIR\_MR}$ for super-resolution (SR), novel view synthesis (NVS), and combined NVS+SR tasks, and $\mathtt{M^3VIR\_{MS}}$, the first multi-style, object-level ground-truth set enabling research on controlled video generation. Additionally, we benchmark several state-of-the-art SR and NVS methods to establish performance baselines. While no existing approaches directly handle controlled video generation, $\mathtt{M^3VIR}$ provides a benchmark for advancing this area. By releasing the dataset, we aim to facilitate research in AI-powered restoration, compression, and controllable content generation for next-generation cloud gaming and entertainment.
Publisher OA PDF DOI
Research on Load Forecasting Technology of Power System Based on Artificial Intelligence
2024-05-17
articleSenior author
With the development of new type power systems, artificial intelligence plays an increasingly important role in the stability and security of power grids. And Power load forecasting is one of the main applications of artificial intelligence. The power load of a city usually includes three types of industrial and commercial people, because the commercial load is relatively fixed, so the forecast of industrial electricity and civil electricity is more important. This paper focuses on the first step of load forecasting: load classification. A convolution neural network coincidence classification method based on data enhancement is proposed based on the comparison of the characteristics of different types of power curves. The results show that the data enhancement can effectively improve the classification accuracy, and the use of convolutional neural network for power load classification can have a better classification effect.
Publisher DOI

Frequent coauthors

Zeyuan Allen-Zhu
44 shared
Sébastien Bubeck
33 shared
Yingyu Liang
22 shared
Yin Tat Lee
21 shared
Mark Sellke
Harvard University
21 shared
Tengyu Ma
18 shared
Andrej Risteski
17 shared
Sanjeev Arora
9 shared

Labs

Yiyun Li's Creative Writing LabPI

Awards & honors

MacArthur Foundation Fellowship
Guggenheim Fellowship
Windham Campbell Prize
2026 Andrew Carnegie Medal
2021 Literature Award from the American Academy of Arts and…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Yiyun Li

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you