Necdet Serhat Aybat

· ProfessorVerified

Pennsylvania State University · Industrial and Manufacturing Engineering

Active 2010–2026

h-index20

Citations1.1k

Papers9924 last 5y

Funding$236k

Faculty page Lab page

See your match with Necdet Serhat Aybat — sign in to PhdFit.Sign in

About

Necdet Serhat Aybat is a Professor in the Department of Industrial and Manufacturing Engineering at Penn State University. His research interests include operations research with a focus on first order methods for large-scale convex optimization, compressive sensing, matrix rank minimization, robust and stable principal component analysis, and distributed algorithms. He has contributed to the development of efficient algorithms for robust low-rank and sparse matrix decomposition, as well as methods for compressed sensing and convex optimization. Professor Aybat has authored a handbook on robust low-rank and sparse matrix decomposition and has published extensively in reputable journals and conference proceedings. His work often addresses large-scale, distributed, and stochastic optimization problems, with applications spanning image and video processing, signal processing, network utility maximization, and multi-agent systems. His research combines theoretical advancements with practical algorithmic solutions, making significant contributions to the fields of optimization, machine learning, and control systems.

Research topics

Mathematics
Mathematical optimization
Computer science
Algorithm
Combinatorics

Selected publications

A Randomized Block-Coordinate Primal-Dual Method for Large-Scale Stochastic Saddle Point Problems
INFORMS Journal on Optimization · 2026-02-25
articleOpen accessSenior author
We consider (stochastic) convex-concave saddle point (SP) problems with high-dimensional decision variables, arising in various applications including machine learning problems. To contend with the challenges in computing full gradients, we employ a randomized block-coordinate primal-dual scheme in which randomly selected primal and dual blocks of variables are updated. We consider both deterministic and stochastic settings, where deterministic partial gradients and their randomly sampled estimates are used, respectively, at each iteration. We investigate the convergence of the proposed method under different blocking strategies and provide the corresponding complexity results. Although the best-known computational complexity result for computing a saddle point with [Formula: see text] primal-dual gap for deterministic primal-dual methods using full gradients is [Formula: see text], where m and n denote the dimensions of primal and dual variables, respectively, we show that our proposed randomized block-coordinate method achieves an improved complexity of [Formula: see text] assuming a coordinate-friendly structure on the problem. Moreover, for the stochastic setting where a mini-batch sample gradient is utilized, we show a computational complexity of [Formula: see text] through acceleration. Finally, almost sure convergence of the iterate sequence to a saddle point is established. Funding: N. Serhat Aybat was supported by the Office of Naval Research [Grant N00014-24-1-2666]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoo.2024.0056 .
Publisher DOI
Accelerated gradient methods with biased gradient estimates: Risk sensitivity, high-probability guarantees, and large deviation bounds
Journal of Nonlinear and Variational Analysis · 2026-01-01
articleOpen accessSenior author
Publisher OA PDF DOI
Accelerated Gradient Methods with Biased Gradient Estimates: Risk Sensitivity, High-Probability Guarantees, and Large Deviation Bounds
ArXiv.org · 2025-09-17
preprintOpen accessSenior author
We study trade-offs between convergence rate and robustness to gradient errors in the context of first-order methods. Our focus is on generalized momentum methods (GMMs)--a broad class that includes Nesterov's accelerated gradient, heavy-ball, and gradient descent methods--for minimizing smooth strongly convex objectives. We allow stochastic gradient errors that may be adversarial and biased, and quantify robustness of these methods to gradient errors via the risk-sensitive index (RSI) from robust control theory. For quadratic objectives with i.i.d. Gaussian noise, we give closed form expressions for RSI in terms of solutions to 2x2 matrix Riccati equations, revealing a Pareto frontier between RSI and convergence rate over the choice of step-size and momentum parameters. We then prove a large-deviation principle for time-averaged suboptimality in the large iteration limit and show that the rate function is, up to a scaling, the convex conjugate of the RSI function. We further show that the rate function and RSI are linked to the $H_\infty$-norm--a measure of robustness to the worst-case deterministic gradient errors--so that stronger worst-case robustness (smaller $H_\infty$-norm) leads to sharper decay of the tail probabilities for the average suboptimality. Beyond quadratics, under potentially biased sub-Gaussian gradient errors, we derive non-asymptotic bounds on a finite-time analogue of the RSI, yielding finite-time high-probability guarantees and non-asymptotic large-deviation bounds for the averaged iterates. In the case of smooth strongly convex functions, we also observe an analogous trade-off between RSI and convergence-rate bounds. To our knowledge, these are the first non-asymptotic guarantees for GMMs with biased gradients and the first risk-sensitive analysis of GMMs. Finally, we provide numerical experiments on a robust regression problem to illustrate our results.
Publisher OA PDF DOI
AGDA+: Proximal Alternating Gradient Descent Ascent Method with a Nonmonotone Adaptive Step-Size Search for Nonconvex Minimax Problems
arXiv (Cornell University) · 2024-06-20
preprintOpen accessSenior author
We consider double-regularized nonconvex-strongly concave (NCSC) minimax problems of the form $(P):\min_{x\in\mathcal{X}} \max_{y\in\mathcal{Y}}g(x)+f(x,y)-h(y)$, where $g$, $h$ are closed convex, $f$ is $L$-smooth in $(x,y)$ and strongly concave in $y$. We propose a proximal alternating gradient descent ascent method AGDA+ that can adaptively choose nonmonotone primal-dual stepsizes to compute an approximate stationary point for $(P)$ without requiring the knowledge of the global Lipschitz constant $L$ and the concavity modulus $μ$. Using a nonmonotone step-size search (backtracking) scheme, AGDA+ stands out by its ability to exploit the local Lipschitz structure and eliminates the need for precise tuning of hyper-parameters. AGDA+ achieves the optimal iteration complexity of $\mathcal{O}(ε^{-2})$ and it is the first step-size search method for NCSC minimax problems that require only $3$ calls to $\nabla f$ on average per backtracking iteration. The numerical experiments demonstrate its robustness and efficiency.
Publisher OA PDF DOI
Adaptive Algorithms for Robust Phase Retrieval
arXiv (Cornell University) · 2024-09-27
preprintOpen access
This paper considers the robust phase retrieval, which can be cast as a nonsmooth and nonconvex composite optimization problem. We propose two first-order algorithms with adaptive step sizes: the subgradient algorithm (AdaSubGrad) and the inexact proximal linear algorithm (AdaIPL). Our contribution lies in the novel design of adaptive step sizes based on quantiles of the absolute residuals. Local linear convergences of both algorithms are analyzed under different regimes for the hyper-parameters. Numerical experiments on synthetic datasets and image recovery also demonstrate that our methods are competitive against the existing methods in the literature utilizing predetermined (possibly impractical) step sizes, such as the subgradient methods and the inexact proximal linear method.
Publisher OA PDF DOI
Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization
Proceedings of the AAAI Conference on Artificial Intelligence · 2024-03-24 · 4 citations
articleOpen access
We propose a novel single-loop decentralized algorithm, DGDA-VR, for solving the stochastic nonconvex strongly-concave minimax problems over a connected network of agents, which are equipped with stochastic first-order oracles to estimate their local gradients. DGDA-VR, incorporating variance reduction, achieves O(ε^−3) oracle complexity and O(ε^−2) communication complexity without resorting to multi-communication rounds – both are optimal, i.e., matching the lower bounds for this class of problems. Since DGDA-VR does not require multiple communication rounds, it is applicable to a broader range of decentralized computational environments. To the best of our knowledge, this is the first distributed method using a single communication round in each iteration to jointly optimize the oracle and communication complexities for the problem considered here.
Publisher OA PDF DOI
Robust Accelerated Primal-Dual Methods for Computing Saddle Points
SIAM Journal on Optimization · 2024-03-19
preprint
We consider strongly convex/strongly concave saddle point problems assuming we have access to unbiased stochastic estimates of the gradients. We propose a stochastic accelerated primal-dual (SAPD) algorithm and show that SAPD iterate sequence, generated using constant primal-dual step sizes, linearly converges to a neighborhood of the unique saddle point, where the size of the neighborhood is determined by the asymptotic variance of the iterates. Interpreting the asymptotic variance as a measure of robustness to gradient noise, we obtain explicit characterizations of robustness in terms of SAPD parameters and problem constants. Based on these characterizations, we develop computationally tractable techniques for optimizing the SAPD parameters, i.e., the primal and dual step sizes, and the momentum parameter, to achieve a desired trade-off between the convergence rate and robustness on the Pareto curve. This allows SAPD to enjoy fast convergence properties while being robust to noise as an accelerated method. We also show that SAPD admits convergence guarantees for the gap metric with a variance term optimal up to a logarithmic factor --which can be removed by employing a restarting strategy. Furthermore, to our knowledge, our work is the first one showing an iteration complexity result for the gap function on smooth SCSC problems without the bounded domain assumption. Finally, we illustrate the efficiency of our approach on distributionally robust logistic regression problems.
Publisher DOI
A Stochastic GDA Method With Backtracking For Solving Nonconvex Concave Minimax Problems
arXiv (Cornell University) · 2024-03-12 · 1 citations
preprintOpen access
We propose a stochastic GDA (gradient descent ascent) method with backtracking (SGDA-B) to solve nonconvex-concave (NCC) minimax problems of the form: $\min_{\mathbf{x}} \max_y \sum_{i=1}^N g_i(x_i)+f(\mathbf{x},y)-h(y)$, where $h$ and $g_i$ for $i=1,\cdots,N$ are closed, convex functions, and for some $L,μ\geq 0$, $f$ is $L$-smooth and $f(\mathbf{x},\cdot)$ is $μ$-strongly concave for all $\mathbf{x}$ in the problem domain. We consider the stochastic setting where one only has an access to an unbiased stochastic oracle of $\nabla f$ with a finite variance bound $σ^2$. While most of the existing methods assume knowledge of $L$, $μ$ and/or $σ^2$, SGDA-B is agnostic to all of these problem parameters. Moreover, SGDA-B can support random block-coordinate updates. In the deterministic setting, i.e., $σ^2=0$ and one can compute $\nabla f$ exactly, SGDA-B can compute an $ε$-stationary point within $\mathcal{O}(Lκ^2/ε^2)$ and $\mathcal{O}(L^3/ε^4)$ gradient calls when $μ>0$ and $μ=0$, respectively, where $κ\triangleq L/μ$. In the stochastic setting, i.e., $σ^2>0$, for any $p\in(0,1)$ and $ε>0$, it can compute an $ε$-stationary point with high probability, which requires $\mathcal{O}(L κ^3 ε^{-4} \log^2(1/p))$ and $\tilde{\mathcal{O}}(L^4ε^{-7}\log^2(1/p))$ stochastic oracle calls, with probability at least $1-p$, when $μ>0$ and $μ=0$, respectively. To our knowledge, SGDA-B is the first GDA-type method with backtracking to solve NCC minimax problems and achieves the best complexity among the methods that are agnostic to $L$, $μ$ and $σ^2$. We also provide numerical results for SGDA-B on a distributionally robust learning problem illustrating the potential performance gains that can be achieved by SGDA-B.
Publisher OA PDF DOI
High-probability complexity guarantees for nonconvex minimax problems
arXiv (Cornell University) · 2024-05-23
preprintOpen access
Stochastic smooth nonconvex minimax problems are prevalent in machine learning, e.g., GAN training, fair classification, and distributionally robust learning. Stochastic gradient descent ascent (GDA)-type methods are popular in practice due to their simplicity and single-loop nature. However, there is a significant gap between the theory and practice regarding high-probability complexity guarantees for these methods on stochastic nonconvex minimax problems. Existing high-probability bounds for GDA-type single-loop methods only apply to convex/concave minimax problems and to particular non-monotone variational inequality problems under some restrictive assumptions. In this work, we address this gap by providing the first high-probability complexity guarantees for nonconvex/PL minimax problems corresponding to a smooth function that satisfies the PL-condition in the dual variable. Specifically, we show that when the stochastic gradients are light-tailed, the smoothed alternating GDA method can compute an $\varepsilon$-stationary point within $O(\frac{\ell κ^2 δ^2}{\varepsilon^4} + \fracκ{\varepsilon^2}(\ell+δ^2\log({1}/{\bar{q}})))$ stochastic gradient calls with probability at least $1-\bar{q}$ for any $\bar{q}\in(0,1)$, where $μ$ is the PL constant, $\ell$ is the Lipschitz constant of the gradient, $κ=\ell/μ$ is the condition number, and $δ^2$ denotes a bound on the variance of stochastic gradients. We also present numerical results on a nonconvex/PL problem with synthetic data and on distributionally robust optimization problems with real data, illustrating our theoretical findings.
Publisher OA PDF DOI
High-probability complexity bounds for stochastic non-convex minimax optimization
2024-01-01
article1st authorCorresponding
Publisher DOI

Recent grants

Decentralized Power Flow Optimization on Electricity Grids via Distributed Consensus Methods
NSF · $236k · 2016–2020

Frequent coauthors

Mert Gürbüzbalaban
Rutgers, The State University of New Jersey
22 shared
Erfan Yazdandoost Hamedani
21 shared
Constantino Lagoa
18 shared
Garud Iyengar
16 shared
Mahmoud Ashour
Marmara University
15 shared
Asuman Ozdaglar
14 shared
Alireza Fallah
14 shared
Omar M. Sleem
Pennsylvania State University
9 shared

Education

PhD, Industrial Engineering and Operations Research
Columbia University
2011

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Necdet Serhat Aybat

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you