Madeleine Udell

· Assistant Professor of Management Science and Engineering and, by courtesy, of Electrical EngineeringVerified

Stanford University · Management Science and Engineering

Active 2012–2026

h-index26

Citations2.7k

Papers15384 last 5y

Funding$621k

Faculty page

See your match with Madeleine Udell — sign in to PhdFit.Sign in

About

Madeleine Udell is an Assistant Professor of Management Science and Engineering at Stanford University, with a courtesy appointment in Electrical Engineering. She completed her PhD at Stanford in Computational and Mathematical Engineering and held a postdoctoral fellowship at the Center for the Mathematics of Information at Caltech. Her research focuses on accelerating and simplifying large-scale data analysis and optimization, aiming to address challenges across healthcare, finance, marketing, operations, and engineering systems design. Her work in optimization seeks to detect and exploit novel structures, leading to faster, more memory-efficient algorithms, automatic proofs of optimality, improved complexity guarantees, and user-friendly optimization solvers and modeling languages. In machine learning, her research centers on data preprocessing, interpretability, and causality, which are critical for practical applications involving messy data.

Research topics

Computer Science
Mathematics
Mathematical optimization
Algorithm
Operating system
Combinatorics
Theoretical computer science
Physics
Geometry
Applied mathematics

Selected publications

Randomized Nyström Preconditioned Interior Point-Proximal Method of Multipliers
SIAM Journal on Scientific Computing · 2026-01-02
articleSenior author
Publisher DOI
SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints
Open MIND · 2026-02-10
preprintSenior author
Neural networks are increasingly used as fast surrogate models across various domains, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture to learn mappings whose outputs must satisfy input-dependent constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. We stabilize end-to-end training by adaptive relaxation, a new training paradigm that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work, and it is the first to enforce non-convex constraints at medium-to-high precision robustly across instances.
DOI
SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints
ArXiv.org · 2026-02-10
articleOpen accessSenior author
Neural networks are increasingly used as fast surrogate models across various domains, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture to learn mappings whose outputs must satisfy input-dependent constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. We stabilize end-to-end training by adaptive relaxation, a new training paradigm that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work, and it is the first to enforce non-convex constraints at medium-to-high precision robustly across instances.
Publisher OA PDF
Scalable approximate optimal diagonal preconditioning
Computational Optimization and Applications · 2026-02-14
article
Publisher DOI
Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project
ArXiv.org · 2025-05-19
preprintOpen accessSenior author
Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model uncertainty. However, GP inference struggles to scale to large datasets (which are common in modern applications), since it requires the solution of a linear system whose size scales quadratically with the number of samples in the dataset. We propose an approximate, distributed, accelerated sketch-and-project algorithm ($\texttt{ADASAP}$) for solving these linear systems, which improves scalability. We use the theory of determinantal point processes to show that the posterior mean induced by sketch-and-project rapidly converges to the true posterior mean. In particular, this yields the first efficient, condition number-free algorithm for estimating the posterior mean along the top spectral basis functions, showing that our approach is principled for GP inference. $\texttt{ADASAP}$ outperforms state-of-the-art solvers based on conjugate gradient and coordinate descent across several benchmark datasets and a large-scale Bayesian optimization task. Moreover, $\texttt{ADASAP}$ scales to a dataset with $> 3 \cdot 10^8$ samples, a feat which has not been accomplished in the literature.
Publisher OA PDF DOI
Diverse Inference and Verification for Advanced Reasoning
ArXiv.org · 2025-02-14
preprintOpen accessSenior author
Reasoning LLMs such as OpenAI o1, o3 and DeepSeek R1 have made significant progress in mathematics and coding, yet find challenging advanced tasks such as International Mathematical Olympiad (IMO) combinatorics problems, Abstraction and Reasoning Corpus (ARC) puzzles, and Humanity's Last Exam (HLE) questions. We use a diverse inference approach that combines multiple models and methods at test time. We find that verifying mathematics and code problems, and rejection sampling on other problems is simple and effective. We automatically verify correctness of solutions to IMO problems by Lean, and ARC puzzles by code, and find that best-of-N effectively answers HLE questions. Our approach increases answer accuracy on IMO combinatorics problems from 33.3% to 77.8%, accuracy on HLE questions from 8% to 37%, and solves 80% of ARC puzzles that 948 humans could not and 26.5% of ARC puzzles that o3 high compute does not. Test-time simulations, reinforcement learning, and meta-learning with inference feedback improve generalization by adapting agent graph representations and varying prompts, code, and datasets. Our approach is reliable, robust, and scalable, and in the spirit of reproducible research, we will make it publicly available upon publication.
Publisher OA PDF DOI
New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes
arXiv (Cornell University) · 2025-12-06
preprintOpen access
In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the $O((1-\frac{1}κ)^K)$ rate for smooth strongly convex functions and the $O(1/K)$ rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both Hölder smoothness and Hölder growth conditions. These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters.
Publisher OA PDF DOI
SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning
ArXiv.org · 2025-01-27
preprintOpen accessSenior author
Regularized empirical risk minimization (rERM) has become important in data-intensive fields such as genomics and advertising, with stochastic gradient methods typically used to solve the largest problems. However, ill-conditioned objectives and non-smooth regularizers undermine the performance of traditional stochastic gradient methods, leading to slow convergence and significant computational costs. To address these challenges, we propose the $\texttt{SAPPHIRE}$ ($\textbf{S}$ketching-based $\textbf{A}$pproximations for $\textbf{P}$roximal $\textbf{P}$reconditioning and $\textbf{H}$essian $\textbf{I}$nexactness with Variance-$\textbf{RE}$educed Gradients) algorithm, which integrates sketch-based preconditioning to tackle ill-conditioning and uses a scaled proximal mapping to minimize the non-smooth regularizer. This stochastic variance-reduced algorithm achieves condition-number-free linear convergence to the optimum, delivering an efficient and scalable solution for ill-conditioned composite large-scale convex machine learning problems. Extensive experiments on lasso and logistic regression demonstrate that $\texttt{SAPPHIRE}$ often converges $20$ times faster than other common choices such as $\texttt{Catalyst}$, $\texttt{SAGA}$, and $\texttt{SVRG}$. This advantage persists even when the objective is non-convex or the preconditioner is infrequently updated, highlighting its robust and practical effectiveness.
Publisher OA PDF DOI
PDE-SHARP: PDE Solver Hybrids through Analysis and Refinement Passes
ArXiv.org · 2025-10-31
preprintOpen accessSenior author
Current LLM-driven approaches using test-time computing to generate PDE solvers execute a large number of solver samples to identify high-accuracy solvers. These paradigms are especially costly for complex PDEs requiring substantial computational resources for numerical evaluation. We introduce PDE-SHARP, a framework to reduce computational costs by replacing expensive scientific computation by cheaper LLM inference that achieves superior solver accuracy with 60-75% fewer computational evaluations. PDE-SHARP employs three stages: (1) Analysis: mathematical chain-of-thought analysis including PDE classification, solution type detection, and stability analysis; (2) Genesis: solver generation based on mathematical insights from the previous stage; and (3) Synthesis: collaborative selection-hybridization tournaments in which LLM judges iteratively refine implementations through flexible performance feedback. To generate high-quality solvers, PDE-SHARP requires fewer than 13 solver evaluations on average compared to 30+ for baseline methods, improving accuracy uniformly across tested PDEs by $4\times$ on average, and demonstrates robust performance across LLM architectures, from general-purpose to specialized reasoning models.
Publisher OA PDF DOI
dnamite: A Python Package for Neural Additive Models
ArXiv.org · 2025-03-06
preprintOpen accessSenior author
Additive models offer accurate and interpretable predictions for tabular data, a critical tool for statistical modeling. Recent advances in Neural Additive Models (NAMs) allow these models to handle complex machine learning tasks, including feature selection and survival analysis, on large-scale data. This paper introduces dnamite, a Python package that implements NAMs for these advanced applications. dnamite provides a scikit-learn style interface to train regression, classification, and survival analysis NAMs, with built-in support for feature selection. We describe the methodology underlying dnamite, its design principles, and its implementation. Through an application to the MIMIC III clinical dataset, we demonstrate the utility of dnamite in a real-world setting where feature selection and survival analysis are both important. The package is publicly available via pip and documented at dnamite.readthedocs.io.
Publisher OA PDF DOI

Recent grants

CAREER: accelerating machine learning with low dimensional structure
NSF · $167k · 2020–2022
CAREER: accelerating machine learning with low dimensional structure
NSF · $454k · 2022–2025

Frequent coauthors

Joel A. Tropp
California Institute of Technology
22 shared
Lijun Ding
Wisconsin Institutes for Discovery
21 shared
Volkan Cevher
16 shared
Stephen Boyd
15 shared
Alp Yurtsever
15 shared
Zachary Frangella
Stanford University
14 shared
Nathan Kallus
13 shared
Iddo Drori
13 shared

Education

Ph.D., Management Science and Engineering
Stanford University
2010
M.S., Management Science and Engineering
Stanford University
2006
B.S., Electrical Engineering and Computer Science
Massachusetts Institute of Technology
2002

Awards & honors

Kavli Fellowship (2023)
Alfred P. Sloan Research Fellowship (2021)
NSF CAREER award (2020)
ONR Young Investigator Award (2020)

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Madeleine Udell

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you