
About
Matthew Stephens is a Professor of Human Genetics at the University of Chicago, working within the Department of Human Genetics. His research focuses on problems at the interface of Statistics and Genetics, often involving the development of novel statistical methodologies that require significant computational components. His lab's work includes creating new approaches to analyze genetic data, with a particular emphasis on fine-mapping of quantitative trait loci (QTLs) from genome-scale molecular profiles, as exemplified by tools like fSuSiE. His background is primarily quantitative, with team members coming from fields such as Statistics and Computer Science, and varying levels of biological training. His research contributions include advancing methods for dissecting tumor transcriptional heterogeneity from single-cell RNA sequencing data, understanding genetic contributions to epigenetic-defined endotypes of allergic phenotypes in children, and exploring the properties of disease-associated loci in response to environmental exposures. His work is characterized by a focus on statistical methodology development, often with a computational emphasis, to address complex problems in genetics and genomics.
Research topics
- Biology
- Genetics
- Computational biology
- Mathematics
- Computer Science
- Artificial Intelligence
- Sociology
- Machine Learning
- Geography
- Cell biology
- Demography
- Ecology
- Mathematical optimization
- Algorithm
- Statistics
Selected publications
Journal of Computational and Graphical Statistics · 2026-04-07
preprintOpen accessSenior authorSparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.
Mammary intraepithelial lymphocytes and intestinal inputs shape T cell dynamics in lactogenesis
Nature Immunology · 2025-07-29 · 7 citations
articleOpen accessCovariate-moderated Empirical Bayes Matrix Factorization
ArXiv.org · 2025-05-16
preprintOpen accessSenior authorMatrix factorization is a fundamental method in statistics and machine learning for inferring and summarizing structure in multivariate data. Modern data sets often come with "side information" of various forms (images, text, graphs) that can be leveraged to improve estimation of the underlying structure. However, existing methods that leverage side information are limited in the types of data they can incorporate, and they assume specific parametric models. Here, we introduce a novel method for this problem, covariate-moderated empirical Bayes matrix factorization (cEBMF). cEBMF is a modular framework that accepts any type of side information that is processable by a probabilistic model or a neural network. The cEBMF framework can accommodate different assumptions and constraints on the factors through the use of different priors, and it adapts these priors to the data. We demonstrate the benefits of cEBMF in simulations and in analyses of spatial transcriptomics and collaborative filtering data. A PyTorch-based implementation of cEBMF with flexible priors is available at https://github.com/william-denault/cebmf_torch.
Nature Genetics · 2025-01-01 · 11 citations
articleOpen accessSenior authorCorrespondingPubMed · 2025-06-06
preprintOpen accessSenior author.
SuSiE 2.0: improved methods and implementations for genetic fine-mapping and phenotype prediction
bioRxiv (Cold Spring Harbor Laboratory) · 2025-11-28 · 1 citations
preprintOpen accessSum of Single Effects regression (SuSiE) has become widely adopted for genetic fine-mapping, yet its original implementation faces architectural limitations that hinder extensibility and performance. We present SuSiE 2.0, featuring a modular redesign for extensibility, up to 5x speed improvements for summary statistics applications, and several useful extensions including SuSiE-ash, a new method that improves calibration when strong signals coexist with moderate effects. Simulations and real data benchmarks demonstrate performance across diverse genetic architectures, highlighting improved calibration of SuSiE-ash for fine-mapping under complex polygenic backgrounds with 1.5-3x FDR reduction while maintaining power, and revealing SuSiE-based methods as effective yet underappreciated tools for TWAS prediction.
smashr: Smoothing by Adaptive Shrinkage
2025-12-15
datasetOpen accessFast, wavelet-based Empirical Bayes shrinkage methods for signal denoising, including smoothing Poisson-distributed data and Gaussian-distributed data with possibly heteroskedastic error. The algorithms implement the methods described Z. Xing, P. Carbonetto & M. Stephens (2021) <<a href="https://jmlr.org/papers/v22/19-042.html" target="_top">https://jmlr.org/papers/v22/19-042.html</a>>.
Disease-associated loci share properties with response eQTLs under common environmental exposures
bioRxiv (Cold Spring Harbor Laboratory) · 2025-05-04
preprintOpen accessAbstract Many of the genetic loci associated with disease are expected to have context-dependent regulatory effects that are underrepresented in the transcriptomes of healthy, steady-state adult tissues. To understand gene regulation across diverse environmental conditions and cellular contexts, we treated a broad array of human cell types with three environmental exposures in vitro . With single-cell RNA-sequencing data from 1.4 million cells across 51 individuals, we identified hundreds of response expression quantitative loci (eQTLs) that are associated with inter-individual differences in regulatory changes following treatment with nicotine, caffeine, or ethanol in diverse cell types. We also identified dynamic regulatory effects that vary across differentiation trajectories in response to exposure. In contrast to steady-state eQTLs, and similar to disease risk loci, response eQTLs are enriched in distal enhancers and are regulating genes that experienced strong selective constraint, contain complex regulatory landscapes, and display diverse biological functions. We identified response eQTLs that coincide with disease-associated loci not explained by steady-state eQTLs. Our results highlight the complexity of genetic regulatory effects and suggest that our ability to interpret disease-associated loci will benefit from the pursuit of studies of gene-by-environment interactions in diverse biological contexts.
Research Square · 2025-10-16
preprintOpen accessSenior authorDisease-associated loci share properties with response eQTLs under common environmental exposures
Research Square · 2025-05-14 · 1 citations
preprintOpen access
Recent grants
Genome analysis: statistical methods and applications
NIH · $7.9M · 2002–2028
NIH · $1.1M · 2018
NIH · $880k · 2011
Frequent coauthors
- 67 shared
Peter Carbonetto
University of Chicago
- 60 shared
Ayellet V. Segrè
Broad Institute
- 57 shared
Sarah Kim-Hellmuth
- 51 shared
Gao Wang
- 50 shared
Tuuli Lappalainen
Science for Life Laboratory
- 49 shared
Jonathan K. Pritchard
Stanford University
- 42 shared
Andrew R. Hamel
Massachusetts Eye and Ear Infirmary
- 38 shared
Xiaoquan Wen
University of Michigan–Ann Arbor
Labs
Matthew Stephens Lab at University of Chicago
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Matthew Stephens
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup