
Lingzhou Xue
· ProfessorVerifiedPennsylvania State University · Statistics
Active 2007–2026
About
Lingzhou Xue is a Professor of Statistics at Penn State. He received his B.Sc. in Statistics from Peking University in 2008 and his Ph.D. in Statistics from the University of Minnesota in 2012. He was a postdoctoral research associate at Princeton University from 2012-2013. His research interests include high-dimensional statistics, nonparametric statistics, statistical and machine learning, large-scale optimization, and statistical modeling in biomedical, environmental, and social sciences. His recent research focuses on causal inference, federated learning, graphical models, high-dimensional inference, optimal transport, random objects, and reinforcement learning. He is a dedicated mentor to Ph.D. students and postdoctoral researchers, with five of his former advisees becoming tenure-track faculty members in statistics.
Research topics
- Computer Science
- Artificial Intelligence
- Mathematics
- Statistics
- Biology
- Algorithm
- Geography
- Waste management
- Cartography
- Computational biology
- Engineering
- Environmental science
- Mathematical optimization
- Bioinformatics
- Applied mathematics
- Petroleum engineering
Selected publications
Open MIND · 2026-02-23
preprintSenior authorWe study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $\tilde{O}(d\sqrt{H^3K})$, where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the first gap-dependent sample complexity upper bound for online multi-agent RL with linear function approximation, achieving linear speedup with respect to the number of agents.
arXiv (Cornell University) · 2026-02-23
articleOpen accessSenior authorWe study gap-dependent performance guarantees for nearly minimax-optimal algorithms in reinforcement learning with linear function approximation. While prior works have established gap-dependent regret bounds in this setting, existing analyses do not apply to algorithms that achieve the nearly minimax-optimal worst-case regret bound $\tilde{O}(d\sqrt{H^3K})$, where $d$ is the feature dimension, $H$ is the horizon length, and $K$ is the number of episodes. We bridge this gap by providing the first gap-dependent regret bound for the nearly minimax-optimal algorithm LSVI-UCB++ (He et al., 2023). Our analysis yields improved dependencies on both $d$ and $H$ compared to previous gap-dependent results. Moreover, leveraging the low policy-switching property of LSVI-UCB++, we introduce a concurrent variant that enables efficient parallel exploration across multiple agents and establish the first gap-dependent sample complexity upper bound for online multi-agent RL with linear function approximation, achieving linear speedup with respect to the number of agents.
Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization
arXiv (Cornell University) · 2026-05-06
preprintOpen accessOn-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and can degrade reasoning performance over time. Moreover, self-distillation from the same model with prompt augmentation lacks the exploratory diversity provided by a genuine external teacher. To address these limitations, we move beyond fixed-teacher KL matching and propose \textbf{P}reference-\textbf{B}ased \textbf{S}elf-\textbf{D}istillation (\textbf{PBSD}), which revisits on-policy self-distillation through a reward-regularized perspective. Instead of directly matching the teacher distribution, we derive a reward-regularized objective whose analytic optimum is a reward-reweighted teacher distribution, yielding a target policy provably superior to the original teacher under this objective. Practically, PBSD optimizes preference gaps between teacher and student samples while maintaining on-policy student sampling. We support this framework with a statistical analysis of the induced preference-learning problem, formally establishing when on policy self-distillation is preferable to learning from an external teacher in our setting. Experiments on mathematical reasoning and tool-use benchmarks across multiple model scales demonstrate that PBSD consistently achieves the strongest average performance among comparable baselines, showing improved training stability over prior self-distillation baselines while preserving token efficiency.
A Unified Framework for Nonlinear Mediation Analysis of Random Objects
arXiv (Cornell University) · 2026-03-30
articleOpen accessSenior authorMediation analysis for complex, non-Euclidean data, such as probability distributions, compositions, images, and networks, presents significant methodological challenges due to the inherent nonlinearity and geometric constraints of such spaces. Existing approaches are often restricted to Euclidean settings or specific data types. We propose Random Object Mediation Analysis (ROMA), a unified framework that simultaneously accommodates object-valued exposures, mediators, and outcomes, enabling the analysis of nonlinear causal pathways in general metric spaces. ROMA leverages an additive Reproducing Kernel Hilbert Space (RKHS) operator model to rigorously disentangle direct and indirect causal pathways, which is a significant advancement over existing single-predictor or purely predictive additive frameworks. Theoretically, we establish the nonparametric identification of causal effects and derive global asymptotic normality for our estimators. Crucially, this theoretical foundation enables the construction of simultaneous confidence bands and global test statistics without the need for computationally intensive resampling. We demonstrate the practical utility of ROMA through simulations and real-world applications involving compositional mediators and distributional outcomes, extending the scope of mediation analysis.
Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization
ArXiv.org · 2026-05-06
articleOpen accessOn-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and can degrade reasoning performance over time. Moreover, self-distillation from the same model with prompt augmentation lacks the exploratory diversity provided by a genuine external teacher. To address these limitations, we move beyond fixed-teacher KL matching and propose \textbf{P}reference-\textbf{B}ased \textbf{S}elf-\textbf{D}istillation (\textbf{PBSD}), which revisits on-policy self-distillation through a reward-regularized perspective. Instead of directly matching the teacher distribution, we derive a reward-regularized objective whose analytic optimum is a reward-reweighted teacher distribution, yielding a target policy provably superior to the original teacher under this objective. Practically, PBSD optimizes preference gaps between teacher and student samples while maintaining on-policy student sampling. We support this framework with a statistical analysis of the induced preference-learning problem, formally establishing when on policy self-distillation is preferable to learning from an external teacher in our setting. Experiments on mathematical reasoning and tool-use benchmarks across multiple model scales demonstrate that PBSD consistently achieves the strongest average performance among comparable baselines, showing improved training stability over prior self-distillation baselines while preserving token efficiency.
EXACT: Explicit Attribute-Guided Decoding-Time Personalization
Open MIND · 2026-02-06
preprintSenior authorAchieving personalized alignment requires adapting large language models to each user's evolving context. While decoding-time personalization offers a scalable alternative to training-time methods, existing methods largely rely on implicit, less interpretable preference representations and impose a rigid, context-agnostic user representation, failing to account for how preferences shift across prompts. We introduce EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes. EXACT first identifies user-specific attribute subsets by maximizing the likelihood of preferred responses in the offline stage. Then, for online inference, EXACT retrieves the most semantically relevant attributes for an incoming prompt and injects them into the context to steer generation. We establish theoretical approximation guarantees for the proposed algorithm under mild assumptions, and provably show that our similarity-based retrieval mechanism effectively mitigates contextual preference shifts, adapting to disparate tasks without pooling conflicting preferences. Extensive experiments on human-annotated preference datasets demonstrate that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.
EXACT: Explicit Attribute-Guided Decoding-Time Personalization
arXiv (Cornell University) · 2026-02-06
articleOpen accessSenior authorAchieving personalized alignment requires adapting large language models to each user's evolving context. While decoding-time personalization offers a scalable alternative to training-time methods, existing methods largely rely on implicit, less interpretable preference representations and impose a rigid, context-agnostic user representation, failing to account for how preferences shift across prompts. We introduce EXACT, a new decoding-time personalization that aligns generation with limited pairwise preference feedback using a predefined set of interpretable attributes. EXACT first identifies user-specific attribute subsets by maximizing the likelihood of preferred responses in the offline stage. Then, for online inference, EXACT retrieves the most semantically relevant attributes for an incoming prompt and injects them into the context to steer generation. We establish theoretical approximation guarantees for the proposed algorithm under mild assumptions, and provably show that our similarity-based retrieval mechanism effectively mitigates contextual preference shifts, adapting to disparate tasks without pooling conflicting preferences. Extensive experiments on human-annotated preference datasets demonstrate that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.
Science China Mathematics · 2026-02-10
articleSenior authorCorrespondingAdvances in Education · 2026-01-01
article1st authorCorrespondingA Unified Framework for Nonlinear Mediation Analysis of Random Objects
arXiv (Cornell University) · 2026-03-30
preprintOpen accessSenior authorMediation analysis for complex, non-Euclidean data, such as probability distributions, compositions, images, and networks, presents significant methodological challenges due to the inherent nonlinearity and geometric constraints of such spaces. Existing approaches are often restricted to Euclidean settings or specific data types. We propose Random Object Mediation Analysis (ROMA), a unified framework that simultaneously accommodates object-valued exposures, mediators, and outcomes, enabling the analysis of nonlinear causal pathways in general metric spaces. ROMA leverages an additive Reproducing Kernel Hilbert Space (RKHS) operator model to rigorously disentangle direct and indirect causal pathways, which is a significant advancement over existing single-predictor or purely predictive additive frameworks. Theoretically, we establish the nonparametric identification of causal effects and derive global asymptotic normality for our estimators. Crucially, this theoretical foundation enables the construction of simultaneous confidence bands and global test statistics without the need for computationally intensive resampling. We demonstrate the practical utility of ROMA through simulations and real-world applications involving compositional mediators and distributional outcomes, extending the scope of mediation analysis.
Recent grants
Collaborative Research: New Methods, Theory and Applications for Nonsmooth Manifold-Based Learning
NSF · $200k · 2020–2024
Collaborative Research: New Statistical Methods and Theory for High-Dimensional Data
NSF · $126k · 2015–2018
Innovated Statistical Inference for Complex and High-Dimensional Data
NSF · $195k · 2018–2023
NSF · $215k · 2020–2024
NIH · $918k · 2023–2027
Frequent coauthors
- 38 shared
Hui Zou
Yangzhou University
- 22 shared
Jianqing Fan
- 15 shared
Shiqian Ma
Rice University
- 13 shared
Amal Agarwal
- 13 shared
Xiufan Yu
- 12 shared
Danning Li
- 11 shared
Kevin H. Lee
Western Michigan University
- 11 shared
Bingyuan Liu
Pennsylvania State University
Labs
Department of StatisticsPI
Education
- 2012
Ph.D., Statistics
University of Minnesota
Awards & honors
- Penn State Schreyer Honors College (SHC) Excellence in Advis…
- Institute of Mathematical Statistics (IMS) Fellow, 2024
- Penn State Huck Institutes Leadership Fellow, 2024
- American Statistical Association (ASA) Fellow, 2023
- National Institute of Statistical Sciences (NISS) Distinguis…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Lingzhou Xue
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup