
Yiyun Li
· Robert F. Goheen Professor in the Humanities; Professor of Creative WritingVerifiedPrinceton University · Theatre
Active 2007–2026
About
Yiyun Li is the Robert F. Goheen Professor in the Humanities and a Professor of Creative Writing at Princeton University. She is an acclaimed author whose works include the memoir Things in Nature Merely Grow, which won the 2026 Pulitzer Prize for memoir, as well as other notable books such as Wednesday's Child, The Book of Goose, Where Reasons End, Dear Friend, from My Life I Write to You in Your Life, and Tolstoy Together, 85 Days of War and Peace with Yiyun Li. Her work has been translated into more than twenty languages and has received numerous honors and awards, including a MacArthur Foundation Fellowship, a Guggenheim Fellowship, a Windham Campbell Prize, the Andrew Carnegie Medal, and the International Writer Award from the Royal Society of Literature. She is a member of both the American Academy of Arts and Sciences and the American Academy of Arts and Letters, and her literary contributions extend into film, with her short story adapted into the award-winning film A Thousand Years of Good Prayers. Li's research and creative work focus on contemporary fiction, literary storytelling, and the exploration of human experiences through her writing.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Mathematical optimization
- Mathematics
- Parallel computing
- Mathematical analysis
- Applied mathematics
- Statistics
- Algorithm
- Combinatorics
- Programming language
- Chemistry
Selected publications
Expected Shortfall Regression via Optimization
arXiv (Cornell University) · 2026-02-21
preprintOpen access1st authorCorrespondingTo provide a comprehensive summary of the tail distribution, the expected shortfall is defined as the average over the tail above (or below) a certain quantile of the distribution. The expected shortfall regression captures the heterogeneous covariate-response relationship and describes the covariate effects on the tail of the response distribution. Based on a critical observation that the superquantile regression from the operations research literature does not coincide with the expected shortfall regression, we propose and validate a novel optimization-based approach for the linear expected shortfall regression, without additional assumptions on the conditional quantile models. While the proposed loss function is implicitly defined, we provide a prototype implementation of the proposed approach with some initial expected shortfall estimators based on binning techniques. With practically feasible initial estimators, we establish the consistency and the asymptotic normality of the proposed estimator. The proposed approach achieves heterogeneity-adaptive weights and therefore often offers efficiency gain over existing linear expected shortfall regression approaches in the literature, as demonstrated through simulation studies.
Expected Shortfall Regression via Optimization
arXiv (Cornell University) · 2026-01-01
articleOpen access1st authorCorrespondingTo provide a comprehensive summary of the tail distribution, the expected shortfall is defined as the average over the tail above (or below) a certain quantile of the distribution. The expected shortfall regression captures the heterogeneous covariate-response relationship and describes the covariate effects on the tail of the response distribution. Based on a critical observation that the superquantile regression from the operations research literature does not coincide with the expected shortfall regression, we propose and validate a novel optimization-based approach for the linear expected shortfall regression, without additional assumptions on the conditional quantile models. While the proposed loss function is implicitly defined, we provide a prototype implementation of the proposed approach with some initial expected shortfall estimators based on binning techniques. With practically feasible initial estimators, we establish the consistency and the asymptotic normality of the proposed estimator. The proposed approach achieves heterogeneity-adaptive weights and therefore often offers efficiency gain over existing linear expected shortfall regression approaches in the literature, as demonstrated through simulation studies.
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
SSRN Electronic Journal · 2025-01-01 · 1 citations
preprintOpen accessSenior authorUnderstanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
ArXiv.org · 2025-07-17
preprintOpen accessSenior authorThe study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.
LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch
arXiv (Cornell University) · 2025-01-13
preprintOpen accessWe detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest LLMs trained?" remains unclear within the community. The implementation details for such high-capacity models are often protected due to business considerations associated with their high cost. This lack of transparency prevents LLM researchers from leveraging valuable insights from prior experience, e.g., "What are the best practices for addressing loss spikes?" The LLM360 K2 project addresses this gap by providing full transparency and access to resources accumulated during the training of LLMs at the largest scale. This report highlights key elements of the K2 project, including our first model, K2 DIAMOND, a 65 billion-parameter LLM that surpasses LLaMA-65B and rivals LLaMA2-70B, while requiring fewer FLOPs and tokens. We detail the implementation steps and present a longitudinal analysis of K2 DIAMOND's capabilities throughout its training process. We also outline ongoing projects such as TXT360, setting the stage for future models in the series. By offering previously unavailable resources, the K2 project also resonates with the 360-degree OPEN SOURCE principles of transparency, reproducibility, and accessibility, which we believe are vital in the era of resource-intensive AI research.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
SSRN Electronic Journal · 2025-01-01 · 3 citations
articleOpen accessPhysics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
SSRN Electronic Journal · 2025-01-01 · 9 citations
articleOpen accessSenior authorPhysics of Language Models: Part 3.1, Knowledge Storage and Extraction
SSRN Electronic Journal · 2025-01-01 · 6 citations
articleOpen accessSenior authorArXiv.org · 2025-09-21
preprintOpen access1st authorCorrespondingThe gaming and entertainment industry is rapidly evolving, driven by immersive experiences and the integration of generative AI (GAI) technologies. Training such models effectively requires large-scale datasets that capture the diversity and context of gaming environments. However, existing datasets are often limited to specific domains or rely on artificial degradations, which do not accurately capture the unique characteristics of gaming content. Moreover, benchmarks for controllable video generation remain absent. To address these limitations, we introduce $\mathtt{M^3VIR}$, a large-scale, multi-modal, multi-view dataset specifically designed to overcome the shortcomings of current resources. Unlike existing datasets, $\mathtt{M^3VIR}$ provides diverse, high-fidelity gaming content rendered with Unreal Engine 5, offering authentic ground-truth LR-HR paired and multi-view frames across 80 scenes in 8 categories. It includes $\mathtt{M^3VIR\_MR}$ for super-resolution (SR), novel view synthesis (NVS), and combined NVS+SR tasks, and $\mathtt{M^3VIR\_{MS}}$, the first multi-style, object-level ground-truth set enabling research on controlled video generation. Additionally, we benchmark several state-of-the-art SR and NVS methods to establish performance baselines. While no existing approaches directly handle controlled video generation, $\mathtt{M^3VIR}$ provides a benchmark for advancing this area. By releasing the dataset, we aim to facilitate research in AI-powered restoration, compression, and controllable content generation for next-generation cloud gaming and entertainment.
Research on Load Forecasting Technology of Power System Based on Artificial Intelligence
2024-05-17
articleSenior authorWith the development of new type power systems, artificial intelligence plays an increasingly important role in the stability and security of power grids. And Power load forecasting is one of the main applications of artificial intelligence. The power load of a city usually includes three types of industrial and commercial people, because the commercial load is relatively fixed, so the forecast of industrial electricity and civil electricity is more important. This paper focuses on the first step of load forecasting: load classification. A convolution neural network coincidence classification method based on data enhancement is proposed based on the comparison of the characteristics of different types of power curves. The results show that the data enhancement can effectively improve the classification accuracy, and the use of convolutional neural network for power load classification can have a better classification effect.
Frequent coauthors
- 44 shared
Zeyuan Allen-Zhu
- 33 shared
Sébastien Bubeck
- 22 shared
Yingyu Liang
- 21 shared
Yin Tat Lee
- 21 shared
Mark Sellke
Harvard University
- 18 shared
Tengyu Ma
- 17 shared
Andrej Risteski
- 9 shared
Sanjeev Arora
Labs
Yiyun Li's Creative Writing LabPI
Awards & honors
- MacArthur Foundation Fellowship
- Guggenheim Fellowship
- Windham Campbell Prize
- 2026 Andrew Carnegie Medal
- 2021 Literature Award from the American Academy of Arts and…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yiyun Li
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup