
Lexin Li
· PhD Chair, Biostatistics DivisionVerifiedUniversity of California, Berkeley · Biostatistics
Active 1992–2025
About
Lexin Li, Ph.D., is a Professor of Biostatistics and the Division Chair at the Department of Biostatistics and Epidemiology at the University of California, Berkeley. He also holds appointments in the Department of Statistics and the Helen Wills Neuroscience Institute at UC Berkeley. His research spans a broad range of topics including neuroimaging analysis, brain-computer interfaces, deep and reinforcement learning, functional and point process data analysis, tensor data analysis, and brain network analysis. Dr. Li's work integrates advanced statistical methodologies with applications in neuroscience and precision health, reflecting his interdisciplinary expertise and leadership in computational and theoretical foundations of learning and inference. He is recognized as a Fellow of several prestigious organizations including the American Association for the Advancement of Science, the Institute of Mathematical Statistics, the American Statistical Association, and the Asia-Pacific Artificial Intelligence Association, and is an elected member of the International Statistical Institute. He serves as the Editor-in-Chief of the Annals of Applied Statistics for the term 2025-2027.
Research topics
- Artificial Intelligence
- Computer Science
- Machine Learning
- Mathematics
- Statistics
- Algorithm
- Applied mathematics
- Data Mining
- Discrete mathematics
Selected publications
Cortico-basal oscillations index naturalistic movements during deep brain stimulation
Brain · 2025-12-13
articleOpen accessThe basal ganglia and sensorimotor cortex are essential nodes of a network that supports motor control. In Parkinson's disease, disruptions in this network lead to rigidity and slowness during movement execution. Deep brain stimulation (DBS) of the basal ganglia has proven effective in alleviating Parkinson's disease-related hypokinetic symptoms, and sensing-enabled neurostimulators now afford the opportunity to detect cortico-basal oscillations during motion. However, the specific contributions of these motor network nodes to chronic, naturalistic movement and the effects of DBS on circuit dynamics are not well understood. To address these gaps, we recorded over 530 hours of cortical and subcortical signals from 15 Parkinson's disease patients (27 hemispheres) during unsupervised, unconstrained daily activities and subthalamic or pallidal DBS. Synchronized wrist-worn accelerometers tracked forearm speeds, supporting the evaluation of neural biomarkers related to motion. Our study validated and extended the known relationship between cortical and subcortical beta power (13-30 Hz) and movement. We show that cortical low (13-20 Hz) and high (21-30 Hz) beta movement-related desynchronization (MRD) effectively distinguished between mobile and stationary states. In the subthalamic nucleus (STN) and globus pallidus interna (GPi), high beta MRD and gamma (40-80 Hz) movement-related synchronization (MRS) exhibited significant group-level correlations with movement kinematics. When stimulated at 130 Hz, cortical stimulation-entrained gamma oscillations at the half-harmonic (∼65 Hz) were observed. Further, cortical entrained gamma MRS was a stronger predictor of motion than broadband gamma MRS. We developed machine learning (ML) models to predict naturalistic movement over extended periods using spectral features from brief neural recordings (0.5-8 s epochs). Cortical models outperformed subcortical models, although combining cortico-basal signals yielded the highest model performance (AUC > 0.85 for binary movement state classifiers; Pearson r statistic > 0.68 for continuous forearm speed regressors). Higher DBS current amplitudes were associated with reduced beta MRD and low gamma (40-60 Hz) MRS in the STN/GPi. This negatively impacted the accuracy of the subcortical models, whereas cortical and cortico-basal model performance remained stable across stimulation amplitudes. Our study demonstrates that cortico-basal nodes of the motor network encode complementary kinematic information, which can be integrated to enhance the accuracy and stability of chronic, naturalistic movement decoding during deep brain stimulation. These insights support the development and integration of therapeutic brain-computer interfaces (BCIs) with closed-loop, adaptive DBS (aDBS) to leverage rapid and precise movement-predictive models for the treatment of motor network disorders.
Multivariate dynamic mediation analysis under a reinforcement learning framework
The Annals of Statistics · 2025-02-01 · 2 citations
articleSenior authorMediation analysis is an important analytic tool commonly used in a broad range of scientific applications. In this article, we study the problem of mediation analysis when there are multivariate and conditionally dependent mediators, and when the variables are observed over multiple time points. The problem is challenging, because the effect of a mediator involves not only the path from the treatment to this mediator itself at the current time point, but also all possible paths pointed to this mediator from its upstream mediators, as well as the carryover effects from all previous time points. We propose a novel multivariate dynamic mediation analysis approach. Drawing inspiration from the Markov decision process model that is frequently employed in reinforcement learning, we introduce a Markov mediation process paired with a system of time-varying linear structural equation models to formulate the problem. We then formally define the individual mediation effect, built upon the idea of simultaneous interventions and intervention calculus. We next derive the closed-form expression, propose an iterative estimation procedure under the Markov mediation process model, and develop a bootstrap method to infer the individual mediation effect. We study both the asymptotic property and the empirical performance of the proposed methodology, and further illustrate its usefulness with a mobile health application.
Residual Feature Integration is Sufficient to Prevent Negative Transfer
ArXiv.org · 2025-05-17
preprintOpen accessSenior authorTransfer learning has become a central paradigm in modern machine learning, yet it suffers from the long-standing problem of negative transfer, where leveraging source representations can harm rather than help performance on the target task. Although empirical remedies have been proposed, there remains little theoretical understanding of how to reliably avoid negative transfer. In this paper, we investigate a simple yet remarkably effective strategy: augmenting frozen, pretrained source-side features with a trainable target-side encoder that adapts target features to capture residual signals overlooked by models pretrained on the source data. We show this residual feature integration strategy is sufficient to provably prevent negative transfer, by establishing theoretical guarantees that it has no worse convergence rate than training from scratch under the informative class of target distributions up to logarithmic factors, and that the convergence rate can transition seamlessly from nonparametric to near-parametric when source representations are informative. To our knowledge, this is the first theoretical work that ensures protection against negative transfer. We carry out extensive numerical experiments across image, text and tabular benchmarks, and empirically verify that the method consistently safeguards performance under distribution shift, label noise, semantic perturbation, and class imbalance. We additionally demonstrate that this residual integration mechanism uniquely supports adapt-time multimodality extension, enabling a pretrained single-cell foundation model to incorporate spatial signals for lymph-node anatomical classification despite the source model being trained without them. Our study thus advances the theory of safe transfer learning, and provides a principled approach that is simple, robust, architecture-agnostic, and broadly applicable.
High-dimensional response growth curve modeling for longitudinal neuroimaging analysis
Computational Statistics & Data Analysis · 2025-07-07
articleSenior authorStat · 2025-08-25
articleABSTRACT Imaging genetics–based mediation analysis provides a powerful approach to understanding gene–brain–cognition pathways. However, despite recent progress, several challenges involving group‐level and subject‐level heterogeneity remain largely unaddressed. In this article, we develop a new model and the associated methodology for imaging genetics–based subgroup mediation analysis. Our key idea is to equip a system of structural equation models with conditional Gaussian graphical models, and we allow both the mean and precision matrix to vary based on individual covariates, which in turn define a set of latent clusters. As a result, our model can simultaneously identify gene–brain–cognition pathways, uncover brain connectivity network structures, handle high‐dimensional multivariate mediators and incorporate both group‐level and subject‐level heterogeneity. We carry out simulations to demonstrate the efficacy of our proposed method and further illustrate the method with an image genetics study of Alzheimer's disease.
Contrastive Network Representation Learning
ArXiv.org · 2025-09-14
preprintOpen accessNetwork representation learning seeks to embed networks into a low-dimensional space while preserving the structural and semantic properties, thereby facilitating downstream tasks such as classification, trait prediction, edge identification, and community detection. Motivated by challenges in brain connectivity data analysis that is characterized by subject-specific, high-dimensional, and sparse networks that lack node or edge covariates, we propose a novel contrastive learning-based statistical approach for network edge embedding, which we name as Adaptive Contrastive Edge Representation Learning (ACERL). It builds on two key components: contrastive learning of augmented network pairs, and a data-driven adaptive random masking mechanism. We establish the non-asymptotic error bounds, and show that our method achieves the minimax optimal convergence rate for edge representation learning. We further demonstrate the applicability of the learned representation in multiple downstream tasks, including network classification, important edge detection, and community detection, and establish the corresponding theoretical guarantees. We validate our method through both synthetic data and real brain connectivities studies, and show its competitive performance compared to the baseline method of sparse principal components analysis.
Structural Classification of Locally Stationary Time Series Based on Second-order Characteristics
ArXiv.org · 2025-07-06
preprintOpen accessSenior authorTime series classification is crucial for numerous scientific and engineering applications. In this article, we present a numerically efficient, practically competitive, and theoretically rigorous classification method for distinguishing between two classes of locally stationary time series based on their time-domain, second-order characteristics. Our approach builds on the autoregressive approximation for locally stationary time series, combined with an ensemble aggregation and a distance-based threshold for classification. It imposes no requirement on the training sample size, and is shown to achieve zero misclassification error rate asymptotically when the underlying time series differ only mildly in their second-order characteristics. The new method is demonstrated to outperform a variety of state-of-the-art solutions, including wavelet-based, tree-based, convolution-based methods, as well as modern deep learning methods, through intensive numerical simulations and a real EEG data analysis for epilepsy classification.
Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance
arXiv (Cornell University) · 2024-06-05
preprintOpen accessImbalanced classification and spurious correlation are common challenges in data science and machine learning. Both issues are linked to data imbalance, with certain groups of data samples significantly underrepresented, which in turn would compromise the accuracy, robustness and generalizability of the learned models. Recent advances have proposed leveraging the flexibility and generative capabilities of large language models (LLMs), typically built on transformer architectures, to generate synthetic samples and to augment the observed data. In the context of imbalanced data, LLMs are used to oversample underrepresented groups and have shown promising improvements. However, there is a clear lack of theoretical understanding of such synthetic data approaches. In this article, we develop novel theoretical foundations to systematically study the roles of synthetic samples in addressing imbalanced classification and spurious correlation. Specifically, we first explicitly quantify the benefits of synthetic oversampling. Next, we analyze the scaling dynamics in synthetic data augmentation, and derive the corresponding scaling law. Finally, we demonstrate the capacity of transformer models to generate high-quality synthetic samples. We further conduct extensive numerical experiments to validate the efficacy of the LLM-based synthetic oversampling and augmentation.
Grassland or Cropland? Land Use Dilemma and Ecological Solutions in Inner Mongolia
Land Degradation and Development · 2024-10-11 · 5 citations
articleABSTRACT Inner Mongolia plays a critical role in both ecological conservation and food provision in China. However, some researchers have argued that focusing on and improving only one side of the equation necessarily threatens the functionality of the opposite side. To address this problem, we compared a “business‐as‐usual” scenario (BAU) with a “sustainable land use planning” scenario (SLU) constructed by simulating spatiotemporal changes in croplands and grasslands in Inner Mongolia from 2020 to 2030. Additionally, we analyzed the changes in ecosystem services and protein supply associated with changes in land use. We found that, in the BAU scenario, grasslands would decrease by 1.85% over the simulation period, while croplands would increase by 9.94%, with ecosystem services decreasing under both land uses. In contrast, land use changes over the same period in the SLU scenario are more significant, with increases of 11.33% and 2.78% in grassland and cropland, respectively, but, in this case, with ecosystem services increasing under both land uses. Moreover, protein supply increased under both scenarios, but SLU scenario can provide 33% more protein than the BAU scenario. The interconversion of cropland and grassland is the main type of land conversion in the study region, while cropland, grassland, and bare land show a triangular cycle of conversion. In addition, the implementation of scenario planning can realize multiple dividend for cultivation, livestock, and ecology in Inner Mongolia.
Biometrics · 2024-03-27 · 1 citations
articleOpen accessSenior authorCorrespondingBrain-effective connectivity analysis quantifies directed influence of one neural element or region over another, and it is of great scientific interest to understand how effective connectivity pattern is affected by variations of subject conditions. Vector autoregression (VAR) is a useful tool for this type of problems. However, there is a paucity of solutions when there is measurement error, when there are multiple subjects, and when the focus is the inference of the transition matrix. In this article, we study the problem of transition matrix inference under the high-dimensional VAR model with measurement error and multiple subjects. We propose a simultaneous testing procedure, with three key components: a modified expectation-maximization (EM) algorithm, a test statistic based on the tensor regression of a bias-corrected estimator of the lagged auto-covariance given the covariates, and a properly thresholded simultaneous test. We establish the uniform consistency for the estimators of our modified EM, and show that the subsequent test achieves both a consistent false discovery control, and its power approaches one asymptotically. We demonstrate the efficacy of our method through both simulations and a brain connectivity study of task-evoked functional magnetic resonance imaging.
Recent grants
Sufficient Dimension Reduction for Missing, Censored, and Correlated Data
NSF · $120k · 2007–2011
Collaborative Research: Tensor Envelope Model - A New Approach for Regressions with Tensor Data
NSF · $130k · 2016–2020
NSF · $100k · 2011–2014
New Statistical Methods for Multicenter Multimodal Longitudinal Neuroimaging Analysis
NIH · $1.5M · 2019–2023
CIF: Small: Collaborative Research: Graphical Modeling of Multivariate Functional Data
NSF · $250k · 2021–2025
Frequent coauthors
- 20 shared
Bing Li
- 18 shared
Li Zhu
University of Chinese Academy of Sciences
- 17 shared
Will Wei Sun
Purdue University West Lafayette
- 14 shared
Xia Yin
- 14 shared
Chengchun Shi
London School of Economics and Political Science
- 13 shared
Jian Kang
University of Michigan–Ann Arbor
- 13 shared
Christopher J. Nachtsheim
University of Minnesota
- 12 shared
Shuning Wang
Awards & honors
- Fellow of the American Statistical Association (ASA)
- Fellow of the Institute of Mathematical Statistics (IMS)
- Elected Member of the International Statistical Institute (I…
- Editor-in-Chief of the Annals of Applied Statistics for 2025…
- Lexin Li named fellow of American Association for the Advanc…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Lexin Li
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup