
Matthias Katzfuss
· Associate ProfessorTexas A&M University · Statistics
Active 2009–2026
About
Matthias Katzfuss is a Professor in the Department of Statistics at the University of Wisconsin–Madison. He is a Fellow of the American Statistical Association and has received several awards including an NSF Career Award, a Fulbright Scholarship, and an Early Investigator Award by the ASA Section on Statistics and the Environment. His research focuses on statistical modeling and inference for large and complex spatial and non-Gaussian data, with particular emphasis on scalable methods for high-dimensional and massive datasets. He has contributed to the development of Gaussian process approximations, Bayesian transport maps, and generative modeling techniques for non-Gaussian spatial fields, among other areas.
Selected publications
arXiv (Cornell University) · 2026-05-04
preprintOpen accessSenior authorGenerative modeling of spatio-temporal fields is crucial for a variety of applications, including stochastic weather generators and climate-model surrogates. However, many such fields exhibit complex dependence structures that vary across space and time and are nonlinear, resulting in nonstationary and non-Gaussian joint distributions. Our approach represents the joint density of a spatio-temporal field as a product of univariate conditional distributions and models these conditionals using Gaussian processes within an autoregressive transport-map construction. This prior distribution provides regularization, making our method suitable for a small number of training samples. Data-dependent sparsity in the conditioning sets ensures scalability to high-dimensional distributions. We also propose a variant of the method designed to sample or predict forward in time from a given incomplete space-time trajectory. We demonstrate the accuracy and scalability of our approach on non-Gaussian climate-model output with tens of millions of data points.
ArXiv.org · 2026-05-04
articleOpen accessSenior authorGenerative modeling of spatio-temporal fields is crucial for a variety of applications, including stochastic weather generators and climate-model surrogates. However, many such fields exhibit complex dependence structures that vary across space and time and are nonlinear, resulting in nonstationary and non-Gaussian joint distributions. Our approach represents the joint density of a spatio-temporal field as a product of univariate conditional distributions and models these conditionals using Gaussian processes within an autoregressive transport-map construction. This prior distribution provides regularization, making our method suitable for a small number of training samples. Data-dependent sparsity in the conditioning sets ensures scalability to high-dimensional distributions. We also propose a variant of the method designed to sample or predict forward in time from a given incomplete space-time trajectory. We demonstrate the accuracy and scalability of our approach on non-Gaussian climate-model output with tens of millions of data points.
Fast Gaussian Process Approximations for Autocorrelated Data
INFORMS Journal on Data Science · 2026-01-07
articleThis paper is concerned with the problem of how to speed up computation for Gaussian process models trained on autocorrelated data. The Gaussian process model is a powerful tool commonly used in nonlinear regression applications. Standard regression modeling assumes random samples and an independently, identically distributed noise. Various fast approximations that speed up Gaussian process regression work under this standard setting. But for autocorrelated data, failing to account for autocorrelation leads to a phenomenon known as temporal overfitting that deteriorates model performance on new test instances. To handle autocorrelated data, existing fast Gaussian process approximations have to be modified; one such approach is to segment the originally correlated data points into blocks in which the blocked data are de-correlated. This work explains how to make some of the existing Gaussian process approximations work with blocked data. Numerical experiments across diverse application data sets demonstrate that the proposed approaches can remarkably accelerate computation for Gaussian process regression on autocorrelated data without compromising model prediction performance. History: Bianca Colosimo served as the senior editor for this article. Funding: Y. Ding received partial support from the National Science Foundation (NSF) [Grant CNS–2328395]. M. Katzfuss received partial support from the NSF [Grant DMS–1953005] and by the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin–Madison with funding from the Wisconsin Alumni Research Foundation. A. Chokhachian’s research was sponsored by the Ocean Energy Safety Institute Consortium (OESIC) through a grant from the U.S. Department of the Interior, Bureau of Safety and Environmental Enforcement (BSEE), and the U.S. Department of Energy (DOE) and was accomplished under Agreement Number E21AC00000.
Probabilistic Hydroclimate Emulation with a Digital Twin Technology for Land Surface Model Ensembles
2025-06-23
preprintOpen accessThe ensemble size for Earth-system model experiments is often low due to computational and data-storage limitations. However, uncertainty quantification for science and applications typically benefits from a complete representation of the probability distribution of geophysical quantities of interest, including their spatio-temporal dependence. This work implements a statistical modeling approach known as a Bayesian transport map (BTM) to estimate this distribution from a moderate-sized ensemble of model runs. As a generative model the BTM also enables fast simulation from the distribution to produce realistic ensembles of arbitrarily large size. The general approach is extended to a copula model suited for the complex distributions of terrestrial hydrology states simulated in land surface model (LSM) experiments. The copula BTM is applied to LSM ensembles of snow water equivalent (SWE) under contemporary and future climate scenarios. The efficiently generated BTM synthetic ensembles produce realistic depictions of the internal variability and spatial patterns of SWE across North America.
Fast Gaussian Process Approximations for Autocorrelated Data
ArXiv.org · 2025-12-02
preprintOpen accessThis paper is concerned with the problem of how to speed up computation for Gaussian process models trained on autocorrelated data. The Gaussian process model is a powerful tool commonly used in nonlinear regression applications. Standard regression modeling assumes random samples and an independently, identically distributed noise. Various fast approximations that speed up Gaussian process regression work under this standard setting. But for autocorrelated data, failing to account for autocorrelation leads to a phenomenon known as temporal overfitting that deteriorates model performance on new test instances. To handle autocorrelated data, existing fast Gaussian process approximations have to be modified; one such approach is to segment the originally correlated data points into blocks in which the blocked data are de-correlated. This work explains how to make some of the existing Gaussian process approximations work with blocked data. Numerical experiments across diverse application datasets demonstrate that the proposed approaches can remarkably accelerate computation for Gaussian process regression on autocorrelated data without compromising model prediction performance.
Linear-Cost Vecchia Approximation of Multivariate Normal Probabilities
Journal of the American Statistical Association · 2025-08-14 · 1 citations
articleSenior authorCorrespondingLearning Non-Gaussian Spatial Distributions Via Bayesian Transport Maps with Parametric Shrinkage
Journal of Agricultural Biological and Environmental Statistics · 2025-03-22
articleSenior authorProbabilistic Skip Connections for Deterministic Uncertainty Quantification in Deep Neural Networks
arXiv (Cornell University) · 2025-01-08
preprintOpen accessSenior authorDeterministic uncertainty quantification (UQ) in deep learning aims to estimate uncertainty with a single pass through a network by leveraging outputs from the network's feature extractor. Existing methods require that the feature extractor be both sensitive and smooth, ensuring meaningful input changes produce meaningful changes in feature vectors. Smoothness enables generalization, while sensitivity prevents feature collapse, where distinct inputs are mapped to identical feature vectors. To meet these requirements, current deterministic methods often retrain networks with spectral normalization. Instead of modifying training, we propose using measures of neural collapse to identify an existing intermediate layer that is both sensitive and smooth. We then fit a probabilistic model to the feature vector of this intermediate layer, which we call a probabilistic skip connection (PSC). Through empirical analysis, we explore the impact of spectral normalization on neural collapse and demonstrate that PSCs can effectively disentangle aleatoric and epistemic uncertainty. Additionally, we show that PSCs achieve uncertainty quantification and out-of-distribution (OOD) detection performance that matches or exceeds existing single-pass methods requiring training modifications. By retrofitting existing models, PSCs enable high-quality UQ and OOD capabilities without retraining.
A Bayesian hierarchical model for climate change detection and attribution
UNC Libraries · 2025-09-11
articleOpen accessRegression-based detection and attribution methods continue to take a central role in the study of climate change and its causes. Here we propose a novel Bayesian hierarchical approach to this problem, which allows us to address several open methodological questions. Specifically, we take into account the uncertainties in the true temperature change due to imperfect measurements, the uncertainty in the true climate signal under different forcing scenarios due to the availability of only a small number of climate model simulations, and the uncertainty associated with estimating the climate variability covariance matrix, including the truncation of the number of empirical orthogonal functions (EOFs) in this covariance matrix. We apply Bayesian model averaging to assign optimal probabilistic weights to different possible truncations and incorporate all uncertainties into the inference on the regression coefficients. We provide an efficient implementation of our method in a software package and illustrate its use with a realistic application.
Sparse Inverse Cholesky Factorization of Dense Kernel Matrices by Greedy Conditional Selection
SIAM/ASA Journal on Uncertainty Quantification · 2025-09-17
article
Labs
Matthias Katzfuss Research GroupPI
Awards & honors
- Fellow of the American Statistical Association
- NSF Career Award
- Fulbright Scholarship
- Early Investigator Award by the ASA Section on Statistics an…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Matthias Katzfuss
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup