
Jelena Diakonikolas
· Assistant ProfessorVerifiedUniversity of Wisconsin-Madison · Computer Sciences
Active 2017–2026
About
Jelena Diakonikolas is an Assistant Professor at the Department of Computer Sciences and (by courtesy) the Department of Statistics at the University of Wisconsin-Madison. She is also an affiliate of the Data Science Institute at UW-Madison. Her main research interests are in the area of large-scale optimization, with a particular focus on applications within machine learning. Her research encompasses first-order convex and non-convex optimization algorithms, momentum-based methods, variational inequalities, fixed-point iterations, and optimization in learning applications, including width-independent algorithms and lower bounds in optimization. Prior to her current position, she held a Postdoctoral Fellowship at UC Berkeley's Foundations of Data Analysis (FODA) TRIPODS Institute, working primarily with Mike Jordan. She was also a Microsoft Research Fellow at the Simons Institute for the Theory of Computing, associated with the Foundations of Data Science program. Her academic background includes a Ph.D. from Columbia University in Electrical Engineering, where she was co-advised by Gil Zussman and Cliff Stein. Her research contributions are recognized through numerous honors and awards, including the NSF CAREER Award, Google ML & Systems Junior Faculty Award, and UW-Madison L&S Distinguished Honors Faculty Award. Diakonikolas is actively involved in the research community, serving on editorial boards, organizing workshops and symposia, and participating in conference program committees. She has organized and co-organized various workshops and mini-symposia related to optimization, machine learning, and data science, and is engaged in mentoring students at both undergraduate and graduate levels. Her work has been presented at numerous international conferences, where she has delivered invited and keynote talks on topics such as optimization algorithms, learning problems, and computational gaps in learning and optimization.
Research topics
- Computer Science
- Mathematics
- Machine Learning
- Mathematical optimization
- Artificial Intelligence
- Distributed computing
- Telecommunications
- Applied mathematics
- Computer network
Selected publications
Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective
SIAM Journal on Optimization · 2026-02-17
article1st authorCorrespondingAdaptive Delayed-Update Cyclic Algorithm for Variational Inequalities
arXiv (Cornell University) · 2026-03-31
preprintOpen accessSenior authorCyclic block coordinate methods are a fundamental class of first-order algorithms, widely used in practice for their simplicity and strong empirical performance. Yet, their theoretical behavior remains challenging to explain, and setting their step sizes -- beyond classical coordinate descent for minimization -- typically requires careful tuning or line-search machinery. In this work, we develop $\texttt{ADUCA}$ (Adaptive Delayed-Update Cyclic Algorithm), a cyclic algorithm addressing a broad class of Minty variational inequalities with monotone Lipschitz operators. $\texttt{ADUCA}$ is parameter-free: it requires no global or block-wise Lipschitz constants and uses no per-epoch line search, except at initialization. A key feature of the algorithm is using operator information delayed by a full cycle, which makes the algorithm compatible with parallel and distributed implementations, and attractive due to weakened synchronization requirements across blocks. We prove that $\texttt{ADUCA}$ attains (near) optimal global oracle complexity as a function of target error $ε>0,$ scaling with $1/ε$ for monotone operators, or with $\log^2(1/ε)$ for operators that are strongly monotone.
Adaptive Delayed-Update Cyclic Algorithm for Variational Inequalities
arXiv (Cornell University) · 2026-03-31
articleOpen accessSenior authorCyclic block coordinate methods are a fundamental class of first-order algorithms, widely used in practice for their simplicity and strong empirical performance. Yet, their theoretical behavior remains challenging to explain, and setting their step sizes -- beyond classical coordinate descent for minimization -- typically requires careful tuning or line-search machinery. In this work, we develop $\texttt{ADUCA}$ (Adaptive Delayed-Update Cyclic Algorithm), a cyclic algorithm addressing a broad class of Minty variational inequalities with monotone Lipschitz operators. $\texttt{ADUCA}$ is parameter-free: it requires no global or block-wise Lipschitz constants and uses no per-epoch line search, except at initialization. A key feature of the algorithm is using operator information delayed by a full cycle, which makes the algorithm compatible with parallel and distributed implementations, and attractive due to weakened synchronization requirements across blocks. We prove that $\texttt{ADUCA}$ attains (near) optimal global oracle complexity as a function of target error $ε>0,$ scaling with $1/ε$ for monotone operators, or with $\log^2(1/ε)$ for operators that are strongly monotone.
Outlier-robust nonsmooth stochastic optimization
Journal of Nonlinear and Variational Analysis · 2026-01-01
articleOpen accessSenior authorRobust Learning of a Group DRO Neuron
arXiv (Cornell University) · 2026-01-26
preprintOpen accessSenior authorWe study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by $\mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$, we seek to approximate $\mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $\boldsymbolλ \in Δ_K$, where the objective is $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $ν\geq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $\widehat{\mathbf w}$ that is constant-factor competitive with $\mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.
Robust Learning of a Group DRO Neuron
ArXiv.org · 2026-01-26
articleOpen accessSenior authorWe study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by $\mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$, we seek to approximate $\mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $\boldsymbolλ \in Δ_K$, where the objective is $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $ν\geq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $\widehat{\mathbf w}$ that is constant-factor competitive with $\mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.
Distributionally Robust Optimization with Adversarial Data Contamination
ArXiv.org · 2025-07-14
preprintOpen accessSenior authorDistributionally Robust Optimization (DRO) provides a framework for decision-making under distributional uncertainty, yet its effectiveness can be compromised by outliers in the training data. This paper introduces a principled approach to simultaneously address both challenges. We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions, where an $ε$-fraction of the training data is adversarially corrupted. Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts, alongside an efficient algorithm inspired by robust statistics to solve the resulting optimization problem. We prove that our method achieves an estimation error of $O(\sqrtε)$ for the true DRO objective value using only the contaminated data under the bounded covariance assumption. This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
ACM / IMS Journal of Data Science · 2025-05-13
articleOpen access1st authorCorrespondingBlock coordinate methods have been extensively studied for minimization problems, where they come with significant complexity improvements whenever the considered problems are compatible with block decomposition and, moreover, block Lipschitz parameters are highly nonuniform. For the more general class of variational inequalities with monotone operators, essentially none of the existing methods transparently shows potential complexity benefits of using block coordinate updates in such settings. Motivated by this gap, we develop a new randomized block coordinate method and study its oracle complexity and runtime. We prove that in the setting where block Lipschitz parameters are highly nonuniform—the main setting in which block coordinate methods lead to high complexity improvements in any of the previously studied settings—our method can lead to complexity improvements by a factor order- m , where m is the number of coordinate blocks. The same method further applies to the more general problem with a finite-sum operator with m components, where it can be interpreted as performing variance reduction. Compared to the state-of-the-art, the method leads to complexity improvements up to a factor \(\sqrt {m},\) obtained when the component Lipschitz parameters are highly nonuniform.
Robustly Learning Monotone Generalized Linear Models via Data Augmentation
ArXiv.org · 2025-02-12
preprintOpen accessSenior authorWe study the task of learning Generalized Linear models (GLMs) in the agnostic model under the Gaussian distribution. We give the first polynomial-time algorithm that achieves a constant-factor approximation for \textit{any} monotone Lipschitz activation. Prior constant-factor GLM learners succeed for a substantially smaller class of activations. Our work resolves a well-known open problem, by developing a robust counterpart to the classical GLMtron algorithm (Kakade et al., 2011). Our robust learner applies more generally, encompassing all monotone activations with bounded $(2+ζ)$-moments, for any fixed $ζ>0$ -- a condition that is essentially necessary. To obtain our results, we leverage a novel data augmentation technique with decreasing Gaussian noise injection and prove a number of structural results that may be useful in other settings.
Linear Regression under Missing or Corrupted Coordinates
ArXiv.org · 2025-09-23
preprintOpen accessWe study multivariate linear regression under Gaussian covariates in two settings, where data may be erased or corrupted by an adversary under a coordinate-wise budget. In the incomplete data setting, an adversary may inspect the dataset and delete entries in up to an $η$-fraction of samples per coordinate; a strong form of the Missing Not At Random model. In the corrupted data setting, the adversary instead replaces values arbitrarily, and the corruption locations are unknown to the learner. Despite substantial work on missing data, linear regression under such adversarial missingness remains poorly understood, even information-theoretically. Unlike the clean setting, where estimation error vanishes with more samples, here the optimal error remains a positive function of the problem parameters. Our main contribution is to characterize this error up to constant factors across essentially the entire parameter range. Specifically, we establish novel information-theoretic lower bounds on the achievable error that match the error of (computationally efficient) algorithms. A key implication is that, perhaps surprisingly, the optimal error in the missing data setting matches that in the corruption setting-so knowing the corruption locations offers no general advantage.
Recent grants
Frequent coauthors
- 17 shared
Lorenzo Orecchia
- 10 shared
Chaobing Song
- 9 shared
Cristóbal Guzmán
Pontificia Universidad Católica de Chile
- 8 shared
Gil Zussman
Columbia University
- 7 shared
Ilias Diakonikolas
- 7 shared
Tingjun Chen
- 7 shared
Michael I. Jordan
- 5 shared
Nikos Zarifis
Labs
Optimization for LearningPI
Education
Ph.D., Electrical Engineering
Columbia University
Other, Computer Science
Boston University
Other, Foundations of Data Analysis (FODA) TRIPODS Institute
UC Berkeley
Awards & honors
- UW-Madison L&S Distinguished Honors Faculty Award (2026)
- Google ML & Systems Junior Faculty Award (2025)
- NSF CAREER Award (2025)
- AFOSR Young Investigator Program award (2024)
- UW-Madison Provost's Award for Mentoring Undergraduates In R…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jelena Diakonikolas
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup