
Adityanand Guntuboyina
VerifiedUniversity of California, Berkeley · Department of Statistics
Active 2008–2026
About
Adityanand Guntuboyina is a professor and Deputy Chair in the Department of Statistics at the University of California, Berkeley. His research interests include nonparametric and high-dimensional statistics, shape constrained statistical estimation, empirical processes, and statistical information theory. He is engaged in advancing methodologies related to nonparametric estimation, shape-constrained estimation, and high-dimensional data analysis, contributing to the theoretical foundations and applications within these areas.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Mathematics
- Applied mathematics
- Combinatorics
- Statistics
- Mathematical optimization
Selected publications
What Functions Does XGBoost Learn?
arXiv (Cornell University) · 2026-01-09
preprintOpen accessSenior authorThis paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.
Journal of the American Statistical Association · 2026-01-30
articleSenior authorCorrespondingFigshare · 2026-01-30
datasetOpen accessSenior authorShape constraints in nonparametric regression provide a powerful framework for estimating regression functions under realistic assumptions without tuning parameters. However, most existing methods—except additive models—impose too weak restrictions, often leading to overfitting in high dimensions. Conversely, additive models can be too rigid, failing to capture covariate interactions. This article introduces a novel multivariate shape-constrained regression approach based on total concavity, originally studied by T. Popoviciu. Our method allows interactions while mitigating the curse of dimensionality, with convergence rates that depend only logarithmically on the number of covariates. We characterize and compute the least squares estimator over totally concave functions, derive theoretical guarantees, and demonstrate its practical effectiveness through empirical studies on real-world datasets. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Figshare · 2026-01-30
datasetOpen accessSenior authorShape constraints in nonparametric regression provide a powerful framework for estimating regression functions under realistic assumptions without tuning parameters. However, most existing methods—except additive models—impose too weak restrictions, often leading to overfitting in high dimensions. Conversely, additive models can be too rigid, failing to capture covariate interactions. This article introduces a novel multivariate shape-constrained regression approach based on total concavity, originally studied by T. Popoviciu. Our method allows interactions while mitigating the curse of dimensionality, with convergence rates that depend only logarithmically on the number of covariates. We characterize and compute the least squares estimator over totally concave functions, derive theoretical guarantees, and demonstrate its practical effectiveness through empirical studies on real-world datasets. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Gaussian mixtures and non-parametric likelihoods through the lens of statistical mechanics
ArXiv.org · 2026-03-24
articleOpen accessIn this work, we investigate Gaussian Mixture Models ({\it abbrv} GMM) and the related problem of non parametric maximum likelihood estimation ({\it abbrv} NPMLE) from the perspective of statistical mechanics. In particular, we establish stability guarantees for the NPMLE procedure that extend well beyond the state of the art. Crucially, we obtain guarantees on the Kullback-Leibler divergence between NPMLE estimators and the ground truth, a type of result which has been known to be challenging in the literature on this problem. In particular, we provide high probability upper bounds on the KL divergence between the NPMLE and the true density that are of the order of $\min\big\{\frac{(\log n)^{d+2}}{n} , \frac{\log n}{\sqrt n}\big\}$, which cover a wide range of scenarios for the comparative sizes of $n$ and $d$. We obtain similar guarantees for approximate solutions to the NPMLE problem, addressing realistic situations wherein optimization algorithms need to be stopped in finite time, allowing access only to approximations to the true NPMLE. A technical cornerstone of our approach is an analysis of the function class complexity of logarithms of gaussian mixture densities, which is able to handle their unboundedness, and could be of wider interest. We also establish correspondences between stability phenomena in the NPMLE problem and concepts from chaos and multiple valleys in random energy landscapes of statistical mechanics models. We believe that these correspondences may be useful for a wide variety of random optimization problems in statistics and machine learning, especially the connections to the the technical ingredients of concentration phenomena and Langevin dynamics for these models.
Gaussian mixtures and non-parametric likelihoods through the lens of statistical mechanics
arXiv (Cornell University) · 2026-03-24
preprintOpen accessIn this work, we investigate Gaussian Mixture Models ({\it abbrv} GMM) and the related problem of non parametric maximum likelihood estimation ({\it abbrv} NPMLE) from the perspective of statistical mechanics. In particular, we establish stability guarantees for the NPMLE procedure that extend well beyond the state of the art. Crucially, we obtain guarantees on the Kullback-Leibler divergence between NPMLE estimators and the ground truth, a type of result which has been known to be challenging in the literature on this problem. In particular, we provide high probability upper bounds on the KL divergence between the NPMLE and the true density that are of the order of $\min\big\{\frac{(\log n)^{d+2}}{n} , \frac{\log n}{\sqrt n}\big\}$, which cover a wide range of scenarios for the comparative sizes of $n$ and $d$. We obtain similar guarantees for approximate solutions to the NPMLE problem, addressing realistic situations wherein optimization algorithms need to be stopped in finite time, allowing access only to approximations to the true NPMLE. A technical cornerstone of our approach is an analysis of the function class complexity of logarithms of gaussian mixture densities, which is able to handle their unboundedness, and could be of wider interest. We also establish correspondences between stability phenomena in the NPMLE problem and concepts from chaos and multiple valleys in random energy landscapes of statistical mechanics models. We believe that these correspondences may be useful for a wide variety of random optimization problems in statistics and machine learning, especially the connections to the the technical ingredients of concentration phenomena and Langevin dynamics for these models.
What Functions Does XGBoost Learn?
ArXiv.org · 2026-01-09
articleOpen accessSenior authorThis paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ that extends finite ensembles of bounded-depth regression trees, together with a complexity measure $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ that generalizes the $L^1$ regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ with penalty $V^{d, s}_{\infty-\text{XGB}}(\cdot)$, providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of $\mathcal{F}^{d, s}_{\infty-\text{ST}}$ and $V^{d, s}_{\infty-\text{XGB}}(\cdot)$ in terms of Hardy--Krause variation. We prove that the least squares estimator over $\{f \in \mathcal{F}^{d, s}_{\infty-\text{ST}}: V^{d, s}_{\infty-\text{XGB}}(f) \le V\}$ achieves a nearly minimax-optimal rate of convergence $n^{-2/3} (\log n)^{4(\min(s, d) - 1)/3}$, thereby avoiding the curse of dimensionality. Our results provide the first rigorous characterization of the function space underlying XGBoost, clarify its connection to classical notions of variation, and identify an important open problem: whether the XGBoost algorithm itself achieves minimax optimality over this class.
arXiv (Cornell University) · 2025-01-08
preprintOpen accessSenior authorShape constraints in nonparametric regression provide a powerful framework for estimating regression functions under realistic assumptions without tuning parameters. However, most existing methods$\unicode{x2013}$except additive models$\unicode{x2013}$impose too weak restrictions, often leading to overfitting in high dimensions. Conversely, additive models can be too rigid, failing to capture covariate interactions. This paper introduces a novel multivariate shape-constrained regression approach based on total concavity, originally studied by T. Popoviciu. Our method allows interactions while mitigating the curse of dimensionality, with convergence rates that depend only logarithmically on the number of covariates. We characterize and compute the least squares estimator over totally concave functions, derive theoretical guarantees, and demonstrate its practical effectiveness through empirical studies on real-world datasets.
Convergence rates for estimating multivariate scale mixtures of uniform densities
Electronic Journal of Statistics · 2025-01-01
articleOpen accessSenior authorThe Grenander estimator is a well-studied procedure for univariate nonparametric density estimation. It is usually defined as the Maximum Likelihood Estimator (MLE) over the class of all non-increasing densities on the positive real line. It can also be seen as the MLE over the class of all scale mixtures of uniform densities. Using the latter viewpoint, Pavlides and Wellner [33] proposed a multivariate extension of the Grenander estimator as the nonparametric MLE over the class of all multivariate scale mixtures of uniform densities. We prove that this multivariate estimator achieves the univariate cube root rate of convergence with only a logarithmic multiplicative factor that depends on the dimension. The usual curse of dimensionality is therefore avoided to some extent for this multivariate estimator. This result positively resolves a conjecture of Pavlides and Wellner [33] under an additional lower bound assumption. Our proof proceeds via a general accuracy result for the Hellinger accuracy of MLEs over convex classes of densities. We also provide algorithms for computing the estimator, and illustrate performance on real and simulated datasets.
The Annals of Statistics · 2024-06-01 · 2 citations
articleSenior authorMultivariate adaptive regression splines (MARS) is a popular method for nonparametric regression introduced by Friedman in 1991. MARS fits simple nonlinear and non-additive functions to regression data. We propose and study a natural lasso variant of the MARS method. Our method is based on least squares estimation over a convex class of functions obtained by considering infinite-dimensional linear combinations of functions in the MARS basis and imposing a variation based complexity constraint. Our estimator can be computed via finite-dimensional convex optimization, although it is defined as a solution to an infinite-dimensional optimization problem. Under a few standard design assumptions, we prove that our estimator achieves a rate of convergence that depends only logarithmically on dimension and thus avoids the usual curse of dimensionality to some extent. We also show that our method is naturally connected to nonparametric estimation techniques based on smoothness constraints. We implement our method with a cross-validation scheme for the selection of the involved tuning parameter and compare it to the usual MARS method in various simulation and real data settings.
Recent grants
NSF · $258k · 2013–2017
CAREER: Nonparametric function estimation: shape constraints, adaptation, inference and beyond
NSF · $400k · 2017–2023
Frequent coauthors
- 24 shared
Bodhisattva Sen
- 11 shared
Xi Chen
Second Xiangya Hospital of Central South University
- 11 shared
Arlene K. H. Kim
Korea University
- 10 shared
Yuchen Zhang
Zhejiang University of Technology
- 8 shared
Sujayam Saha
Google (United States)
- 8 shared
Sabyasachi Chatterjee
- 7 shared
Billy Fang
Google (United States)
- 6 shared
Martin J. Wainwright
Massachusetts Institute of Technology
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Adityanand Guntuboyina
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup