
Madeleine Udell
· Assistant Professor of Management Science and Engineering and, by courtesy, of Electrical EngineeringVerifiedStanford University · Management Science and Engineering
Active 2012–2026
About
Madeleine Udell is an Assistant Professor of Management Science and Engineering at Stanford University, with a courtesy appointment in Electrical Engineering. She completed her PhD at Stanford in Computational and Mathematical Engineering and held a postdoctoral fellowship at the Center for the Mathematics of Information at Caltech. Her research focuses on accelerating and simplifying large-scale data analysis and optimization, aiming to address challenges across healthcare, finance, marketing, operations, and engineering systems design. Her work in optimization seeks to detect and exploit novel structures, leading to faster, more memory-efficient algorithms, automatic proofs of optimality, improved complexity guarantees, and user-friendly optimization solvers and modeling languages. In machine learning, her research centers on data preprocessing, interpretability, and causality, which are critical for practical applications involving messy data.
Research topics
- Computer Science
- Mathematics
- Mathematical optimization
- Algorithm
- Operating system
- Combinatorics
- Theoretical computer science
- Physics
- Geometry
- Applied mathematics
Selected publications
Randomized Nyström Preconditioned Interior Point-Proximal Method of Multipliers
SIAM Journal on Scientific Computing · 2026-01-02
articleSenior authorSnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints
Open MIND · 2026-02-10
preprintSenior authorNeural networks are increasingly used as fast surrogate models across various domains, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture to learn mappings whose outputs must satisfy input-dependent constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. We stabilize end-to-end training by adaptive relaxation, a new training paradigm that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work, and it is the first to enforce non-convex constraints at medium-to-high precision robustly across instances.
SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints
ArXiv.org · 2026-02-10
articleOpen accessSenior authorNeural networks are increasingly used as fast surrogate models across various domains, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture to learn mappings whose outputs must satisfy input-dependent constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. We stabilize end-to-end training by adaptive relaxation, a new training paradigm that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work, and it is the first to enforce non-convex constraints at medium-to-high precision robustly across instances.
Scalable approximate optimal diagonal preconditioning
Computational Optimization and Applications · 2026-02-14
articleTurbocharging Gaussian Process Inference with Approximate Sketch-and-Project
ArXiv.org · 2025-05-19
preprintOpen accessSenior authorGaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number of samples in the dataset. We propose an approximate, distributed, accelerated sketch-and-project algorithm ($\texttt{ADASAP}$) for solving these linear systems, which improves scalability. We use the theory of determinantal point processes to show that the posterior mean induced by sketch-and-project rapidly converges to the true posterior mean. In particular, this yields the first efficient, condition number-free algorithm for estimating the posterior mean along the top spectral basis functions, showing that our approach is principled for GP inference. $\texttt{ADASAP}$ outperforms state-of-the-art solvers based on conjugate gradient and coordinate descent across several benchmark datasets and a large-scale Bayesian optimization task. Moreover, $\texttt{ADASAP}$ scales to a dataset with $> 3 \cdot 10^8$ samples, a feat which has not been accomplished in the literature.
Diverse Inference and Verification for Advanced Reasoning
ArXiv.org · 2025-02-14
preprintOpen accessSenior authorReasoning LLMs such as OpenAI o1, o3 and DeepSeek R1 have made significant progress in mathematics and coding, yet find challenging advanced tasks such as International Mathematical Olympiad (IMO) combinatorics problems, Abstraction and Reasoning Corpus (ARC) puzzles, and Humanity's Last Exam (HLE) questions. We use a diverse inference approach that combines multiple models and methods at test time. We find that verifying mathematics and code problems, and rejection sampling on other problems is simple and effective. We automatically verify correctness of solutions to IMO problems by Lean, and ARC puzzles by code, and find that best-of-N effectively answers HLE questions. Our approach increases answer accuracy on IMO combinatorics problems from 33.3% to 77.8%, accuracy on HLE questions from 8% to 37%, and solves 80% of ARC puzzles that 948 humans could not and 26.5% of ARC puzzles that o3 high compute does not. Test-time simulations, reinforcement learning, and meta-learning with inference feedback improve generalization by adapting agent graph representations and varying prompts, code, and datasets. Our approach is reliable, robust, and scalable, and in the spirit of reproducible research, we will make it publicly available upon publication.
New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes
arXiv (Cornell University) · 2025-12-06
preprintOpen accessIn this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the $O((1-\frac{1}κ)^K)$ rate for smooth strongly convex functions and the $O(1/K)$ rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both Hölder smoothness and Hölder growth conditions. These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters.
SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning
ArXiv.org · 2025-01-27
preprintOpen accessSenior authorRegularized empirical risk minimization (rERM) has become important in data-intensive fields such as genomics and advertising, with stochastic gradient methods typically used to solve the largest problems. However, ill-conditioned objectives and non-smooth regularizers undermine the performance of traditional stochastic gradient methods, leading to slow convergence and significant computational costs. To address these challenges, we propose the $\texttt{SAPPHIRE}$ ($\textbf{S}$ketching-based $\textbf{A}$pproximations for $\textbf{P}$roximal $\textbf{P}$reconditioning and $\textbf{H}$essian $\textbf{I}$nexactness with Variance-$\textbf{RE}$educed Gradients) algorithm, which integrates sketch-based preconditioning to tackle ill-conditioning and uses a scaled proximal mapping to minimize the non-smooth regularizer. This stochastic variance-reduced algorithm achieves condition-number-free linear convergence to the optimum, delivering an efficient and scalable solution for ill-conditioned composite large-scale convex machine learning problems. Extensive experiments on lasso and logistic regression demonstrate that $\texttt{SAPPHIRE}$ often converges $20$ times faster than other common choices such as $\texttt{Catalyst}$, $\texttt{SAGA}$, and $\texttt{SVRG}$. This advantage persists even when the objective is non-convex or the preconditioner is infrequently updated, highlighting its robust and practical effectiveness.
PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes
ArXiv.org · 2025-10-31
preprintOpen accessSenior authorCurrent LLM-driven approaches using test-time computing to generate PDE solvers execute a large number of solver samples to identify high-accuracy solvers. These paradigms are especially costly for complex PDEs requiring substantial computational resources for numerical evaluation. We introduce PDE-SHARP, a framework to reduce computational costs by replacing expensive scientific computation by cheaper LLM inference that achieves superior solver accuracy with 60-75% fewer computational evaluations. PDE-SHARP employs three stages: (1) Analysis: mathematical chain-of-thought analysis including PDE classification, solution type detection, and stability analysis; (2) Genesis: solver generation based on mathematical insights from the previous stage; and (3) Synthesis: collaborative selection-hybridization tournaments in which LLM judges iteratively refine implementations through flexible performance feedback. To generate high-quality solvers, PDE-SHARP requires fewer than 13 solver evaluations on average compared to 30+ for baseline methods, improving accuracy uniformly across tested PDEs by $4\times$ on average, and demonstrates robust performance across LLM architectures, from general-purpose to specialized reasoning models.
dnamite: A Python Package for Neural Additive Models
ArXiv.org · 2025-03-06
preprintOpen accessSenior authorAdditive models offer accurate and interpretable predictions for tabular data, a critical tool for statistical modeling. Recent advances in Neural Additive Models (NAMs) allow these models to handle complex machine learning tasks, including feature selection and survival analysis, on large-scale data. This paper introduces dnamite, a Python package that implements NAMs for these advanced applications. dnamite provides a scikit-learn style interface to train regression, classification, and survival analysis NAMs, with built-in support for feature selection. We describe the methodology underlying dnamite, its design principles, and its implementation. Through an application to the MIMIC III clinical dataset, we demonstrate the utility of dnamite in a real-world setting where feature selection and survival analysis are both important. The package is publicly available via pip and documented at dnamite.readthedocs.io.
Recent grants
CAREER: accelerating machine learning with low dimensional structure
NSF · $167k · 2020–2022
CAREER: accelerating machine learning with low dimensional structure
NSF · $454k · 2022–2025
Frequent coauthors
- 22 shared
Joel A. Tropp
California Institute of Technology
- 21 shared
Lijun Ding
Wisconsin Institutes for Discovery
- 16 shared
Volkan Cevher
- 15 shared
Stephen Boyd
- 15 shared
Alp Yurtsever
- 14 shared
Zachary Frangella
Stanford University
- 13 shared
Nathan Kallus
- 13 shared
Iddo Drori
Education
- 2010
Ph.D., Management Science and Engineering
Stanford University
- 2006
M.S., Management Science and Engineering
Stanford University
- 2002
B.S., Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Awards & honors
- Kavli Fellowship (2023)
- Alfred P. Sloan Research Fellowship (2021)
- NSF CAREER award (2020)
- ONR Young Investigator Award (2020)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Madeleine Udell
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup