
Yixu Chen
VerifiedPrinceton University · Art and Archaeology
Active 1998–2025
About
Yixu Eliza Chen is a Ph.D. candidate specializing in the art and visual culture of late imperial and modern China. Her research interests include transmediality, reproduction, antiquarianism, and the visual dimensions of knowledge production. Her dissertation examines how ink rubbing—a medium long valued for reproducing and transmitting antiquities—evolved during the nineteenth and early twentieth centuries into a transmedial site of artistic and epistemic change. This evolution is explored through the intersection of rubbing practices with photography, photomechanical printing, and graphic design. Drawing on approaches from media studies and the history of science, her research investigates how media technologies reshaped ways of seeing and knowing the past. Her work considers the dialogue between literati and popular visual cultures and the transcultural flows of people, images, and ideas, highlighting the impact of technological and cultural exchanges on visual and epistemic practices.
Research topics
- Artificial Intelligence
- Computer Science
- Mathematics
- Algorithm
- Statistics
- Mathematical optimization
Selected publications
The Emergence of Abstract Thought in Large Language Models Beyond Any Language
ArXiv.org · 2025-06-11
preprintOpen access1st authorCorrespondingAs large language models (LLMs) continue to advance, their capacity to function effectively across a diverse range of languages has shown marked improvement. Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts. This has led to the widespread assumption that LLMs may "think" in English. However, more recent results showing strong multilingual performance, even surpassing English performance on specific tasks in other languages, challenge this view. In this work, we find that LLMs progressively develop a core language-agnostic parameter space-a remarkably small subset of parameters whose deactivation results in significant performance degradation across all languages. This compact yet critical set of parameters underlies the model's ability to generalize beyond individual languages, supporting the emergence of abstract thought that is not tied to any specific linguistic system. Specifically, we identify language-related neurons-those are consistently activated during the processing of particular languages, and categorize them as either shared (active across multiple languages) or exclusive (specific to one). As LLMs undergo continued development over time, we observe a marked increase in both the proportion and functional importance of shared neurons, while exclusive neurons progressively diminish in influence. These shared neurons constitute the backbone of the core language-agnostic parameter space, supporting the emergence of abstract thought. Motivated by these insights, we propose neuron-specific training strategies tailored to LLMs' language-agnostic levels at different development stages. Experiments across diverse LLM families support our approach.
Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models
ArXiv.org · 2025-10-20
preprintOpen accessBalancing exploration and exploitation during reinforcement learning fine-tuning of generative models presents a critical challenge, as existing approaches rely on fixed divergence regularization that creates an inherent dilemma: strong regularization preserves model capabilities but limits reward optimization, while weak regularization enables greater alignment but risks instability or reward hacking. We introduce Adaptive Divergence Regularized Policy Optimization (ADRPO), which automatically adjusts regularization strength based on advantage estimates-reducing regularization for high-value samples while applying stronger regularization to poor samples, enabling policies to navigate between exploration and aggressive exploitation according to data quality. Our implementation with Wasserstein-2 regularization for flow matching generative models achieves remarkable results on text-to-image generation, achieving better semantic alignment and diversity than offline methods like DPO and online methods with fixed regularization like ORW-CFM-W2. ADRPO enables a 2B parameter SD3 model to surpass much larger models with 4.8B and 12B parameters in attribute binding, semantic consistency, artistic style transfer, and compositional control while maintaining generation diversity. ADRPO generalizes to KL-regularized fine-tuning of both text-only LLMs and multi-modal reasoning models, enhancing existing online RL methods like GRPO. In LLM fine-tuning, ADRPO demonstrates an emergent ability to escape local optima through active exploration, while in multi-modal audio reasoning, it outperforms GRPO through superior step-by-step reasoning, enabling a 7B model to outperform substantially larger commercial models including Gemini 2.5 Pro and GPT-4o Audio, offering an effective plug-and-play solution to the exploration-exploitation challenge across diverse generative architectures and modalities.
Settling the Sample Complexity of Online Reinforcement Learning
Journal of the ACM · 2025-05-02 · 2 citations
articleOpen accessA central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version of MVP (Monotonic Value Propagation), an optimistic model-based algorithm proposed by Zhang et al. [82], achieves a regret on the order of (modulo log factors) \begin{equation*} \min \big \lbrace \sqrt {SAH^3K}, \,HK \big \rbrace, \end{equation*} where S is the number of states, A is the number of actions, H is the horizon length, and K is the total number of episodes. This regret matches the minimax lower bound for the entire range of sample size K ≥ 1, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield ε-accuracy) of \(\frac{SAH^3}{\varepsilon ^2} \) up to log factor, which is minimax-optimal for the full ε-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in a novel analysis paradigm (based on a new concept called “profiles”) to decouple complicated statistical dependency across the sample trajectories — a long-standing challenge facing the analysis of online RL in the sample-starved regime.
Using LLMs for Automated Privacy Policy Analysis: Prompt Engineering, Fine-Tuning and Explainability
ArXiv.org · 2025-03-16 · 1 citations
preprintOpen access1st authorCorrespondingPrivacy policies are widely used by digital services and often required for legal purposes. Many machine learning based classifiers have been developed to automate detection of different concepts in a given privacy policy, which can help facilitate other automated tasks such as producing a more reader-friendly summary and detecting legal compliance issues. Despite the successful applications of large language models (LLMs) to many NLP tasks in various domains, there is very little work studying the use of LLMs for automated privacy policy analysis, therefore, if and how LLMs can help automate privacy policy analysis remains under-explored. To fill this research gap, we conducted a comprehensive evaluation of LLM-based privacy policy concept classifiers, employing both prompt engineering and LoRA (low-rank adaptation) fine-tuning, on four state-of-the-art (SOTA) privacy policy corpora and taxonomies. Our experimental results demonstrated that combining prompt engineering and fine-tuning can make LLM-based classifiers outperform other SOTA methods, \emph{significantly} and \emph{consistently} across privacy policy corpora/taxonomies and concepts. Furthermore, we evaluated the explainability of the LLM-based classifiers using three metrics: completeness, logicality, and comprehensibility. For all three metrics, a score exceeding 91.1\% was observed in our evaluation, indicating that LLMs are not only useful to improve the classification performance, but also to enhance the explainability of detection results.
Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods
SIAM Journal on Optimization · 2025-06-18 · 1 citations
articleSenior authorOptimal Multi-Distribution Learning
Journal of the ACM · 2025-08-25
articleOpen accessMulti-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across k distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness, multi-group collaboration, and so on. Achieving data-efficient MDL necessitates adaptive sampling, also called on-demand sampling, throughout the learning process. However, there exist substantial gaps between the state-of-the-art upper and lower bounds on the optimal sample complexity. Focusing on a hypothesis class of Vapnik–Chervonenkis (VC) dimension d , we propose a novel algorithm that yields an ɛ-optimal randomized hypothesis with a sample complexity on the order of \(\frac{d+k}{\varepsilon ^2}\) (modulo some logarithmic factor), matching the best-known lower bound. Our algorithmic ideas and theory are further extended to accommodate Rademacher classes. The proposed algorithms are oracle-efficient, which access the hypothesis class solely through an empirical risk minimization oracle. Additionally, we establish the necessity of improper learning, revealing a large sample size barrier when only deterministic, proper hypotheses are permitted. These findings resolve three open problems presented in COLT 2023 (i.e., Awasthi et al. [ 4 , Problems 1, 3, and 4]).
The Triple Variation of Marx’s Labor Theory of Value in the Era of Artificial Intelligence
2025-05-21
article1st authorCorresponding劳动价值论是马克思主义政治经济学的基本理论。在理论探究的维度上,对价值源泉的独特性进行深入剖析显得尤为重要,应明确强调,积极投身于工作是驱动价值创造的唯一途径。同时,要认识到劳动者主体地位的变化及其不可替代性,探究资本的本质,并了解人工智能对资本既有的促进作用也有潜在的负面影响。在实践层面,我们应在发展过程中坚持活劳动价值一元论,强调劳动的意义,并深化对活劳动的理解。此外,要牢牢维护劳动者的主体地位,充分发挥劳动者的主体作用,促进人的全面发展。总之,我们应该用辩证唯物主义的思想和历史的眼光积极回应这一问题。使其在人工智能时代继续发挥科学指导作用,协调好人工智能与劳动者的关系,促进社会经济结构的优化和文明的进步。
2025-03-27
preprintOpen accessReal-time monitoring of plant nutrient levels, particularly phosphate, is essential for optimizing plant growth and addressing nutrient imbalances in precision agriculture. Conventional sensors mostly suffer from poor stability, reproducibility, matrix effects, and high costs, limiting their scalability and practical application. To overcome these challenges, a deep learning (DL)-integrated remote-gate field-effect transistor (FET) sensor utilizing a plant-derived graphene electrode is introduced for enhanced performance and reliability. These solution-processed graphene electrodes composed of cellulose nanocrystals (CNCs) from plant fibers are functionalized with phosphate-capturing ferritin and serve as the sensing surface, capacitively coupled to a commercial n-type FET, addressing device variability issues. DL integration significantly improved accuracy, enabling robust and precise phosphate detection. The sensor demonstrates a sensitivity of 14.1 mV/dec after the pH correction, a coefficient of variation (CV) of responses below 5%, and a 1 ng/mL detection limit. As a proof-of-concept, phosphate levels in Hoagland solution, a standard plant nutrient medium, were monitored, achieving an r2 of 0.951 and a CV of 5.39%. A handheld prototype system further demonstrates its potential for on-site continuous monitoring. This sustainable and cost-effective approach provides a scalable solution for real-time phosphate detection with high sensitivity and reproducibility, meeting agricultural demands.
Statistical and Algorithmic Foundations of Reinforcement Learning
2025-10-01
book-chapterResidual Policy Gradient: A Reward View of KL-regularized Objective
ArXiv.org · 2025-03-14
preprintOpen accessReinforcement Learning and Imitation Learning have achieved widespread success in many domains but remain constrained during real-world deployment. One of the main issues is the additional requirements that were not considered during training. To address this challenge, policy customization has been introduced, aiming to adapt a prior policy while preserving its inherent properties and meeting new task-specific requirements. A principled approach to policy customization is Residual Q-Learning (RQL), which formulates the problem as a Markov Decision Process (MDP) and derives a family of value-based learning algorithms. However, RQL has not yet been applied to policy gradient methods, which restricts its applicability, especially in tasks where policy gradient has already proven more effective. In this work, we first derive a concise form of Soft Policy Gradient as a preliminary. Building on this, we introduce Residual Policy Gradient (RPG), which extends RQL to policy gradient methods, allowing policy customization in gradient-based RL settings. With the view of RPG, we rethink the KL-regularized objective widely used in RL fine-tuning. We show that under certain assumptions, KL-regularized objective leads to a maximum-entropy policy that balances the inherent properties and task-specific requirements on a reward-level. Our experiments in MuJoCo demonstrate the effectiveness of Soft Policy Gradient and Residual Policy Gradient.
Recent grants
RI: Small: Uncertainty Quantification for Nonconvex Low-Complexity Models
NSF · $450k · 2022–2026
RI: Small: Uncertainty Quantification for Nonconvex Low-Complexity Models
NSF · $450k · 2021–2022
CIF: Small: Taming Nonconvexity in High-Dimensional Statistical Estimation
NSF · $500k · 2019–2024
NSF · $288k · 2022–2024
NSF · $385k · 2019–2022
Frequent coauthors
- 57 shared
Yuejie Chi
- 23 shared
Jianqing Fan
- 23 shared
Yuting Wei
University of Pennsylvania
- 20 shared
Andrea Goldsmith
Princeton University
- 18 shared
Cong Ma
Northwestern Polytechnical University
- 16 shared
Yuling Yan
Massachusetts Institute of Technology
- 16 shared
Changxiao Cai
University of Michigan–Ann Arbor
- 15 shared
H. Vincent Poor
Princeton University
Education
- 2017
Postdoc, Statistics
Stanford University
- 2015
Ph.D, Electrical Engineering
Stanford University
- 2015
Ph.D. minor, Management Science and Engineering
Stanford University
- 2013
M.A., Statistics
Stanford University
- 2010
M.S., Electrical and Computer Engineering
University of Texas at Austin
- 2008
B.Engineering, Microelectronics
Tsinghua University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yixu Chen
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup