Andrea Montanari

Verified

Stanford University · Statistics

Active 1970–2025

h-index78

Citations28.7k

Papers67397 last 5y

Funding$2.7M

Faculty page

See your match with Andrea Montanari — sign in to PhdFit.Sign in

About

Andrea Montanari is the John D. and Sigrid Banks Professor of Statistics and Mathematics at Stanford University. He holds a joint appointment in the Department of Statistics and the Wu Tsai Neurosciences Institute. Montanari has been recognized with the appointment as the Robert and Barbara Kleist Professor in the School of Engineering, an honor endowed in 1997 to honor faculty members in information systems technology. His research interests include high-dimensional statistics, machine learning, and probability theory. Montanari's contributions are acknowledged through his appointment and recognition within the academic community, reflecting his significant role in advancing research in these areas.

Research topics

Machine Learning
Artificial Intelligence
Computer Science
Business
Risk analysis (engineering)
Engineering
Physics
Geometry
Quantum mechanics
Software engineering
Data science
Mathematical physics
Combinatorics
Mathematics

Selected publications

Ricostruzione, restituzione. Il secondo dopoguerra alle Officine Meccaniche Italiane nelle nuove fonti d'archivio (1945-1951)
PASSATO E PRESENTE · 2025-06-01
article1st authorCorresponding
Publisher DOI
Shattering in Pure Spherical Spin Glasses
Communications in Mathematical Physics · 2025-04-12 · 5 citations
article
Publisher DOI
Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks
HAL (Le Centre pour la Communication Scientifique Directe) · 2025-02-28
preprintOpen access1st authorCorresponding
Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of non-equilibrium statistical physics. We show that, for large network width $m$, and large number of samples per input dimension $n/d$, the training dynamics exhibits a separation of timescales which implies: $(i)$~The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity of the network; $(ii)$~Inductive bias towards small complexity if the initialization has small enough complexity; $(iii)$~A dynamical decoupling between feature learning and overfitting regimes; $(iv)$~A non-monotone behavior of the test error, associated `feature unlearning' regime at large times.
Publisher DOI
Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition
Journal of the American Statistical Association · 2025-07-31
article1st author
Publisher DOI
Un «affare triste». La Segreteria di Stato di Pio XII e l’occupazione delle “Reggiane” (1950-1952)
Clionet · 2025-01-01
articleOpen access1st authorCorresponding
L’articolo, tramite le carte conservate presso gli archivi vaticani di Roma riguardanti il pontificato di Pio XII (1939-1958), getta una prima luce sull’attenzione che la Santa Sede pose fra 1951 e 1952 all’occupazione – con i suoi 368 giorni la più lunga della storia d’Italia – e al futuro delle Officine meccaniche italiane di Reggio Emilia, le celebri “Reggiane”. This article, drawing upon documents preserved in the Vatican Apostolic Archive in Rome concerning the pontificate of Pius XII (1939-1958), sheds initial light on the attention the Holy See devoted between 1951 and 1952 to the occupation – at 368 days, the longest in Italian history – and the future of the Officine meccaniche italiane in Reggio Emilia, the famous “Reggiane”.
Publisher DOI
The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime
The Annals of Statistics · 2025-04-01 · 7 citations
article1st authorCorresponding
Publisher DOI
Sampling from mean-field Gibbs measures via diffusion processes
Probability and Mathematical Physics · 2025-07-21 · 1 citations
articleOpen access
Publisher DOI
Local minima of the empirical risk in high dimension: General theorems and convex examples
ArXiv.org · 2025-02-04
preprintOpen access
We consider a general model for high-dimensional empirical risk minimization whereby the data $\mathbf{x}_i$ are $d$-dimensional Gaussian vectors, the model is parametrized by $\mathbfΘ\in\mathbb{R}^{d\times k}$, and the loss depends on the data via the projection $\mathbfΘ^\mathsf{T}\mathbf{x}_i$. This setting covers as special cases classical statistics methods (e.g. multinomial regression and other generalized linear models), but also two-layer fully connected neural networks with $k$ hidden neurons. We use the Kac-Rice formula from Gaussian process theory to derive a bound on the expected number of local minima of this empirical risk, under the proportional asymptotics in which $n,d\to\infty$, with $n\asymp d$. Via Markov's inequality, this bound allows to determine the positions of these minimizers (with exponential deviation bounds) and hence derive sharp asymptotics on the estimation and prediction error. As a special case, we apply our characterization to convex losses. We show that our approach is tight and allows to prove previously conjectured results. In addition, we characterize the spectrum of the Hessian at the minimizer. A companion paper applies our general result to non-convex examples.
Publisher OA PDF DOI
On Smale's 17th problem over the reals
arXiv (Cornell University) · 2024-05-02 · 1 citations
preprintOpen access1st authorCorresponding
We consider the problem of efficiently solving a system of $n$ non-linear equations in ${\mathbb R}^d$. Addressing Smale's 17th problem stated in 1998, we consider a setting whereby the $n$ equations are random homogeneous polynomials of arbitrary degrees. In the complex case and for $n= d-1$, Beltrán and Pardo proved the existence of an efficient randomized algorithm and Lairez recently showed it can be de-randomized to produce a deterministic efficient algorithm. Here we consider the real setting, to which previously developed methods do not apply. We describe a polynomial time algorithm that finds solutions (with high probability) for $n= d -O(\sqrt{d\log d})$ if the maximal degree is bounded by $d^2$ and for $n=d-1$ if the maximal degree is larger than $d^2$.
Publisher OA PDF DOI
Fundamental limits of low-rank matrix estimation with diverging aspect ratios
The Annals of Statistics · 2024-08-01 · 3 citations
article1st authorCorresponding
Publisher DOI

Recent grants

CIF: Small: Learning and estimation with rough non-convex objectives: Fundamental limits and efficient algorithms
NSF · $330k · 2020–2023
The game dynamics of social interaction: Algorithms and applications
NSF · $500k · 2009–2013
CIF:Small:Information-theoretic and Computational Thresholds in Statistical Learning
NSF · $450k · 2017–2021
BIGDATA: F: Reliable Inference with Big Data: Reproducibility, Data Sharing, Heterogeneity
NSF · $650k · 2017–2021
CIF: Small: Optimal Iterative Estimation in Signal Processing, Information Theory and Machine Learning
NSF · $416k · 2013–2018

Frequent coauthors

Rüdiger Urbanke
78 shared
Adel Javanmard
58 shared
Sergio Caracciolo
University of Milan
48 shared
Cyril Méasson
Nokia (France)
47 shared
Andrea Pelissetto
Istituto Nazionale di Fisica Nucleare, Roma Tor Vergata
41 shared
Marc Mézard
Bocconi University
37 shared
Federico Ricci‐Tersenghi
Istituto Nanoscienze
35 shared
Mei Song
Beijing University of Posts and Telecommunications
31 shared

Awards & honors

John D. and Sigrid Banks Professor
Robert and Barbara Kleist Professor in the School of Enginee…
2021 IMS Medallion Lecture

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Andrea Montanari

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you