Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Matthew Stephens

Matthew Stephens

· ProfessorVerified

University of Chicago · Medical Genetics

Active 1972–2026

h-index89
Citations110.3k
Papers29974 last 5y
Funding$10.0M1 active
See your match with Matthew Stephens — sign in to PhdFit.Sign in

About

Matthew Stephens is a Professor of Human Genetics at the University of Chicago, working within the Department of Human Genetics. His research focuses on problems at the interface of Statistics and Genetics, often involving the development of novel statistical methodologies that require significant computational components. His lab's work includes creating new approaches to analyze genetic data, with a particular emphasis on fine-mapping of quantitative trait loci (QTLs) from genome-scale molecular profiles, as exemplified by tools like fSuSiE. His background is primarily quantitative, with team members coming from fields such as Statistics and Computer Science, and varying levels of biological training. His research contributions include advancing methods for dissecting tumor transcriptional heterogeneity from single-cell RNA sequencing data, understanding genetic contributions to epigenetic-defined endotypes of allergic phenotypes in children, and exploring the properties of disease-associated loci in response to environmental exposures. His work is characterized by a focus on statistical methodology development, often with a computational emphasis, to address complex problems in genetics and genomics.

Research topics

  • Biology
  • Genetics
  • Computational biology
  • Mathematics
  • Computer Science
  • Artificial Intelligence
  • Sociology
  • Machine Learning
  • Geography
  • Cell biology
  • Demography
  • Ecology
  • Mathematical optimization
  • Algorithm
  • Statistics

Selected publications

  • Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

    Journal of Computational and Graphical Statistics · 2026-04-07

    preprintOpen accessSenior author

    Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.

  • Mammary intraepithelial lymphocytes and intestinal inputs shape T cell dynamics in lactogenesis

    Nature Immunology · 2025-07-29 · 7 citations

    articleOpen access
  • Covariate-moderated Empirical Bayes Matrix Factorization

    ArXiv.org · 2025-05-16

    preprintOpen accessSenior author

    Matrix factorization is a fundamental method in statistics and machine learning for inferring and summarizing structure in multivariate data. Modern data sets often come with "side information" of various forms (images, text, graphs) that can be leveraged to improve estimation of the underlying structure. However, existing methods that leverage side information are limited in the types of data they can incorporate, and they assume specific parametric models. Here, we introduce a novel method for this problem, covariate-moderated empirical Bayes matrix factorization (cEBMF). cEBMF is a modular framework that accepts any type of side information that is processable by a probabilistic model or a neural network. The cEBMF framework can accommodate different assumptions and constraints on the factors through the use of different priors, and it adapts these priors to the data. We demonstrate the benefits of cEBMF in simulations and in analyses of spatial transcriptomics and collaborative filtering data. A PyTorch-based implementation of cEBMF with flexible priors is available at https://github.com/william-denault/cebmf_torch.

  • Dissecting tumor transcriptional heterogeneity from single-cell RNA-seq data by generalized binary covariance decomposition

    Nature Genetics · 2025-01-01 · 11 citations

    articleOpen accessSenior authorCorresponding
  • BAYESIAN VARIABLE SELECTION IN A COX PROPORTIONAL HAZARDS MODEL WITH THE "SUM OF SINGLE EFFECTS" PRIOR.

    PubMed · 2025-06-06

    preprintOpen accessSenior author

    .

  • SuSiE 2.0: improved methods and implementations for genetic fine-mapping and phenotype prediction

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-11-28 · 1 citations

    preprintOpen access

    Sum of Single Effects regression (SuSiE) has become widely adopted for genetic fine-mapping, yet its original implementation faces architectural limitations that hinder extensibility and performance. We present SuSiE 2.0, featuring a modular redesign for extensibility, up to 5x speed improvements for summary statistics applications, and several useful extensions including SuSiE-ash, a new method that improves calibration when strong signals coexist with moderate effects. Simulations and real data benchmarks demonstrate performance across diverse genetic architectures, highlighting improved calibration of SuSiE-ash for fine-mapping under complex polygenic backgrounds with 1.5-3x FDR reduction while maintaining power, and revealing SuSiE-based methods as effective yet underappreciated tools for TWAS prediction.

  • smashr: Smoothing by Adaptive Shrinkage

    2025-12-15

    datasetOpen access

    Fast, wavelet-based Empirical Bayes shrinkage methods for signal denoising, including smoothing Poisson-distributed data and Gaussian-distributed data with possibly heteroskedastic error. The algorithms implement the methods described Z. Xing, P. Carbonetto &amp; M. Stephens (2021) &lt;<a href="https://jmlr.org/papers/v22/19-042.html" target="_top">https://jmlr.org/papers/v22/19-042.html</a>&gt;.

  • Disease-associated loci share properties with response eQTLs under common environmental exposures

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-05-04

    preprintOpen access

    Abstract Many of the genetic loci associated with disease are expected to have context-dependent regulatory effects that are underrepresented in the transcriptomes of healthy, steady-state adult tissues. To understand gene regulation across diverse environmental conditions and cellular contexts, we treated a broad array of human cell types with three environmental exposures in vitro . With single-cell RNA-sequencing data from 1.4 million cells across 51 individuals, we identified hundreds of response expression quantitative loci (eQTLs) that are associated with inter-individual differences in regulatory changes following treatment with nicotine, caffeine, or ethanol in diverse cell types. We also identified dynamic regulatory effects that vary across differentiation trajectories in response to exposure. In contrast to steady-state eQTLs, and similar to disease risk loci, response eQTLs are enriched in distal enhancers and are regulating genes that experienced strong selective constraint, contain complex regulatory landscapes, and display diverse biological functions. We identified response eQTLs that coincide with disease-associated loci not explained by steady-state eQTLs. Our results highlight the complexity of genetic regulatory effects and suggest that our ability to interpret disease-associated loci will benefit from the pursuit of studies of gene-by-environment interactions in diverse biological contexts.

  • Accounting for uncertainty in residual variances improves calibration for fine-mapping with small sample sizes

    Research Square · 2025-10-16

    preprintOpen accessSenior author
  • Disease-associated loci share properties with response eQTLs under common environmental exposures

    Research Square · 2025-05-14 · 1 citations

    preprintOpen access

Recent grants

Frequent coauthors

  • Peter Carbonetto

    University of Chicago

    67 shared
  • Ayellet V. Segrè

    Broad Institute

    60 shared
  • Sarah Kim-Hellmuth

    57 shared
  • Gao Wang

    51 shared
  • Tuuli Lappalainen

    Science for Life Laboratory

    50 shared
  • Jonathan K. Pritchard

    Stanford University

    49 shared
  • Andrew R. Hamel

    Massachusetts Eye and Ear Infirmary

    42 shared
  • Xiaoquan Wen

    University of Michigan–Ann Arbor

    38 shared

Labs

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Matthew Stephens

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup