
Andrea Montanari
VerifiedStanford University · Statistics
Active 1970–2025
About
Andrea Montanari is the John D. and Sigrid Banks Professor of Statistics and Mathematics at Stanford University. He holds a joint appointment in the Department of Statistics and the Wu Tsai Neurosciences Institute. Montanari has been recognized with the appointment as the Robert and Barbara Kleist Professor in the School of Engineering, an honor endowed in 1997 to honor faculty members in information systems technology. His research interests include high-dimensional statistics, machine learning, and probability theory. Montanari's contributions are acknowledged through his appointment and recognition within the academic community, reflecting his significant role in advancing research in these areas.
Research topics
- Machine Learning
- Artificial Intelligence
- Computer Science
- Business
- Risk analysis (engineering)
- Engineering
- Physics
- Geometry
- Quantum mechanics
- Software engineering
- Data science
- Mathematical physics
- Combinatorics
- Mathematics
Selected publications
PASSATO E PRESENTE · 2025-06-01
article1st authorCorrespondingShattering in Pure Spherical Spin Glasses
Communications in Mathematical Physics · 2025-04-12 · 5 citations
articleDynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks
HAL (Le Centre pour la Communication Scientifique Directe) · 2025-02-28
preprintOpen access1st authorCorrespondingUnderstanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of non-equilibrium statistical physics. We show that, for large network width $m$, and large number of samples per input dimension $n/d$, the training dynamics exhibits a separation of timescales which implies: $(i)$~The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity of the network; $(ii)$~Inductive bias towards small complexity if the initialization has small enough complexity; $(iii)$~A dynamical decoupling between feature learning and overfitting regimes; $(iv)$~A non-monotone behavior of the test error, associated `feature unlearning' regime at large times.
Provably Efficient Posterior Sampling for Sparse Linear Regression via Measure Decomposition
Journal of the American Statistical Association · 2025-07-31
article1st authorUn «affare triste». La Segreteria di Stato di Pio XII e l’occupazione delle “Reggiane” (1950-1952)
Clionet · 2025-01-01
articleOpen access1st authorCorrespondingL’articolo, tramite le carte conservate presso gli archivi vaticani di Roma riguardanti il pontificato di Pio XII (1939-1958), getta una prima luce sull’attenzione che la Santa Sede pose fra 1951 e 1952 all’occupazione – con i suoi 368 giorni la più lunga della storia d’Italia – e al futuro delle Officine meccaniche italiane di Reggio Emilia, le celebri “Reggiane”. This article, drawing upon documents preserved in the Vatican Apostolic Archive in Rome concerning the pontificate of Pius XII (1939-1958), sheds initial light on the attention the Holy See devoted between 1951 and 1952 to the occupation – at 368 days, the longest in Italian history – and the future of the Officine meccaniche italiane in Reggio Emilia, the famous “Reggiane”.
The Annals of Statistics · 2025-04-01 · 7 citations
article1st authorCorrespondingSampling from mean-field Gibbs measures via diffusion processes
Probability and Mathematical Physics · 2025-07-21 · 1 citations
articleOpen accessLocal minima of the empirical risk in high dimension: General theorems and convex examples
ArXiv.org · 2025-02-04
preprintOpen accessWe consider a general model for high-dimensional empirical risk minimization whereby the data $\mathbf{x}_i$ are $d$-dimensional Gaussian vectors, the model is parametrized by $\mathbfΘ\in\mathbb{R}^{d\times k}$, and the loss depends on the data via the projection $\mathbfΘ^\mathsf{T}\mathbf{x}_i$. This setting covers as special cases classical statistics methods (e.g. multinomial regression and other generalized linear models), but also two-layer fully connected neural networks with $k$ hidden neurons. We use the Kac-Rice formula from Gaussian process theory to derive a bound on the expected number of local minima of this empirical risk, under the proportional asymptotics in which $n,d\to\infty$, with $n\asymp d$. Via Markov's inequality, this bound allows to determine the positions of these minimizers (with exponential deviation bounds) and hence derive sharp asymptotics on the estimation and prediction error. As a special case, we apply our characterization to convex losses. We show that our approach is tight and allows to prove previously conjectured results. In addition, we characterize the spectrum of the Hessian at the minimizer. A companion paper applies our general result to non-convex examples.
On Smale's 17th problem over the reals
arXiv (Cornell University) · 2024-05-02 · 1 citations
preprintOpen access1st authorCorrespondingWe consider the problem of efficiently solving a system of $n$ non-linear equations in ${\mathbb R}^d$. Addressing Smale's 17th problem stated in 1998, we consider a setting whereby the $n$ equations are random homogeneous polynomials of arbitrary degrees. In the complex case and for $n= d-1$, Beltrán and Pardo proved the existence of an efficient randomized algorithm and Lairez recently showed it can be de-randomized to produce a deterministic efficient algorithm. Here we consider the real setting, to which previously developed methods do not apply. We describe a polynomial time algorithm that finds solutions (with high probability) for $n= d -O(\sqrt{d\log d})$ if the maximal degree is bounded by $d^2$ and for $n=d-1$ if the maximal degree is larger than $d^2$.
Fundamental limits of low-rank matrix estimation with diverging aspect ratios
The Annals of Statistics · 2024-08-01 · 3 citations
article1st authorCorresponding
Recent grants
NSF · $330k · 2020–2023
The game dynamics of social interaction: Algorithms and applications
NSF · $500k · 2009–2013
CIF:Small:Information-theoretic and Computational Thresholds in Statistical Learning
NSF · $450k · 2017–2021
BIGDATA: F: Reliable Inference with Big Data: Reproducibility, Data Sharing, Heterogeneity
NSF · $650k · 2017–2021
NSF · $416k · 2013–2018
Frequent coauthors
- 78 shared
Rüdiger Urbanke
- 58 shared
Adel Javanmard
- 48 shared
Sergio Caracciolo
University of Milan
- 47 shared
Cyril Méasson
Nokia (France)
- 41 shared
Andrea Pelissetto
Istituto Nazionale di Fisica Nucleare, Roma Tor Vergata
- 37 shared
Marc Mézard
Bocconi University
- 35 shared
Federico Ricci‐Tersenghi
Istituto Nanoscienze
- 31 shared
Mei Song
Beijing University of Posts and Telecommunications
Awards & honors
- John D. and Sigrid Banks Professor
- Robert and Barbara Kleist Professor in the School of Enginee…
- 2021 IMS Medallion Lecture
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Andrea Montanari
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup