Robert M. Freund

· Theresa Seley Professor in Management Science

Massachusetts Institute of Technology · Operations Research and Statistics

Active 1977–2024

h-index32

Citations7.3k

Papers17014 last 5y

Funding—

Faculty page Lab page

OpenAlex

See your match with Robert M. Freund — sign in to PhdFit.Sign in

About

Robert M. Freund is the Theresa Seley Professor in Management Science and a Professor of Operations Research at the MIT Sloan School of Management. His main research interests include convex optimization, computational complexity, convex geometry, large-scale nonlinear optimization, and related mathematical systems. His recent work focuses on first-order methods and their connections to statistical and machine learning. Freund has served as coeditor of the journal Mathematical Programming and as associate editor for several optimization and operations research journals. He has held leadership roles such as Co-Director of the MIT Operations Research Center, the MIT Program in Computation for Design and Optimization, and Chair of the INFORMS Optimization Section. Freund received his BA in mathematics from Princeton University and his MS and PhD in operations research from Stanford University. He has been recognized with awards including the Longuet-Higgins Prize in computer vision and the MIT Seegal Prize for inspiring students to pursue and achieve excellence.

Research topics

Computer Science
Artificial Intelligence
Mathematical optimization
Mathematics
Algorithm
Applied mathematics
Combinatorics

Selected publications

Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses
Mathematical Programming · 2024-08-22 · 6 citations
article
Publisher DOI
Using Taylor-Approximated Gradients to Improve the Frank–Wolfe Method for Empirical Risk Minimization
SIAM Journal on Optimization · 2024-07-11 · 1 citations
articleSenior author
Publisher DOI
Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses
arXiv (Cornell University) · 2023-01-04 · 1 citations
preprintOpen access
We propose a computer-assisted approach to the analysis of the worst-case convergence of nonlinear conjugate gradient methods (NCGMs). Those methods are known for their generally good empirical performances for large-scale optimization, while having relatively incomplete analyses. Using our computer-assisted approach, we establish novel complexity bounds for the Polak-Ribière-Polyak (PRP) and the Fletcher-Reeves (FR) NCGMs for smooth strongly convex minimization. In particular, we construct mathematical proofs that establish the first non-asymptotic convergence bound for FR (which is historically the first developed NCGM), and a much improved non-asymptotic convergence bound for PRP. Additionally, we provide simple adversarial examples on which these methods do not perform better than gradient descent with exact line search, leaving very little room for improvements on the same class of problems.
Publisher OA PDF DOI
An Oblivious Ellipsoid Algorithm for Solving a System of (In)Feasible Linear Inequalities
Mathematics of Operations Research · 2023-03-23 · 2 citations
article
The ellipsoid algorithm is a fundamental algorithm for computing a solution to the system of m linear inequalities in n variables [Formula: see text] when its set of solutions has positive volume. However, when [Formula: see text] is infeasible, the ellipsoid algorithm has no mechanism for proving that (P) is infeasible. This is in contrast to the other two fundamental algorithms for tackling [Formula: see text], namely, the simplex and interior-point methods, each of which can be easily implemented in a way that either produces a solution of [Formula: see text] or proves that [Formula: see text] is infeasible by producing a solution to the alternative system [Formula: see text]. This paper develops an oblivious ellipsoid algorithm (OEA) that either produces a solution of [Formula: see text] or produces a solution of [Formula: see text]. Depending on the dimensions and other condition measures, the computational complexity of the basic OEA may be worse than, the same as, or better than that of the standard ellipsoid algorithm. We also present two modified versions of OEA, whose computational complexity is superior to that of OEA when [Formula: see text]. This is achieved in the first modified version by proving infeasibility without producing a solution of [Formula: see text], and in the second version by using more memory. Funding: J. Lamperski and R. M. Freund were supported by the Air Force Office of Scientific Research [Grant FA9550-19-1-0240].
Publisher DOI
On the Relation Between LP Sharpness and Limiting Error Ratio and Complexity Implications for Restarted PDHG
arXiv (Cornell University) · 2023-12-21
preprintOpen accessSenior author
There has been a recent surge in development of first-order methods (FOMs) for solving huge-scale linear programming (LP) problems. The attractiveness of FOMs for LP stems in part from the fact that they avoid costly matrix factorization computation. However, the efficiency of FOMs is significantly influenced - both in theory and in practice - by certain instance-specific LP condition measures. Xiong and Freund recently showed that the performance of the restarted primal-dual hybrid gradient method (PDHG) is predominantly determined by two specific condition measures: LP sharpness and Limiting Error Ratio. In this paper we examine the relationship between these two measures, particularly in the case when the optimal solution is unique (which is generic - at least in theory), and we present an upper bound on the Limiting Error Ratio involving the reciprocal of the LP sharpness. This shows that in LP instances where there is a dual nondegenerate optimal solution, the computational complexity of restarted PDHG can be characterized solely in terms of LP sharpness and the distance to optimal solutions, and simplifies the theoretical complexity upper bound of restarted PDHG for these instances.
Publisher OA PDF DOI
Paradoxical Interventions
2023-07-20
book-chapterSenior author
Abstract This chapter outlines the evidence base for the use of paradoxical interventions (PIs) in individual psychotherapy. Often misunderstood, PIs have shown long-term (distal) impacts on clinical outcomes. Definitions of PIs and their constituent elements (double-binds, reframing, circular causality) are presented along with clinical examples. The chapter reports on one meta-analysis comparing PIs with a placebo or control and another comparing PIs to other therapeutic methods. Paradoxes demonstrated a large effect (d = 1.1, k = 17 studies) compared to controls and a medium effect size (d = 0.49, k = 17 studies) compared to other therapeutic methods. The chapter concludes with diversity considerations, training recommendations, and therapeutic practices.
Publisher DOI
Computational Guarantees for Restarted PDHG for LP based on "Limiting Error Ratios" and LP Sharpness
arXiv (Cornell University) · 2023-12-22
preprintOpen accessSenior author
In recent years, there has been growing interest in solving linear optimization problems - or more simply "LP" - using first-order methods in order to avoid the costly matrix factorizations of traditional methods for huge-scale LP instances. The restarted primal-dual hybrid gradient method (PDHG) - together with some heuristic techniques - has emerged as a powerful tool for solving huge-scale LPs. However, the theoretical understanding of the restarted PDHG and the validation of various heuristic implementation techniques are still very limited. Existing complexity analyses have relied on the Hoffman constant of the LP KKT system, which is known to be overly conservative, difficult to compute, and fails to offer insight into instance-specific characteristics of the LP problems. These limitations have limited the capability to discern which characteristics of LP instances lead to easy versus difficult instances. With the goal of overcoming these limitations, we introduce and develop two purely geometry-based condition measures for LP instances: "limiting error ratio" and LP sharpness. We provide new computational guarantees for the restarted PDHG based on these two condition measures. For limiting error ratio, we provide a computable upper bound and show its relationship with the data instance's proximity to infeasibility under perturbation. For LP sharpness, we prove its equivalence to the stability of the LP optimal solution set under perturbation of the objective function. Our computational guarantees are validated by constructed instances. Conversely, our computational guarantees validate the practical efficacy of certain heuristic techniques (row preconditioners and step-size tuning) that improve computational performance. Finally, we present computational experiments on LP relaxations from the MIPLIB dataset that demonstrate the promise of various implementation strategies.
Publisher OA PDF DOI
Analysis of the Frank–Wolfe method for convex composite optimization involving a logarithmically-homogeneous barrier
Mathematical Programming · 2022 · 15 citations
Senior authorCorresponding
- Computer Science
- Algorithm
- Computer Science
Abstract We present and analyze a new generalized Frank–Wolfe method for the composite optimization problem $$(P): {\min }_{x\in {\mathbb {R}}^n} \; f(\mathsf {A} x) + h(x)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>:</mml:mo><mml:msub><mml:mo>min</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mi>n</mml:mi></mml:msup></mml:mrow></mml:msub><mml:mspace/><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>A</mml:mi><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math> , where f is a $$\theta $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>θ</mml:mi></mml:math> -logarithmically-homogeneous self-concordant barrier, $$\mathsf {A}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>A</mml:mi></mml:math> is a linear operator and the function h has a bounded domain but is possibly non-smooth. We show that our generalized Frank–Wolfe method requires $$O((\delta _0 + \theta + R_h)\ln (\delta _0) + (\theta + R_h)^2/\varepsilon )$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>δ</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mi>θ</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>ln</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>δ</mml:mi><mml:mn>0</mml:mn></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>θ</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup><mml:mo>/</mml:mo><mml:mi>ε</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math> iterations to produce an $$\varepsilon $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>ε</mml:mi></mml:math> -approximate solution, where $$\delta _0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>δ</mml:mi><mml:mn>0</mml:mn></mml:msub></mml:math> denotes the initial optimality gap and $$R_h$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>R</mml:mi><mml:mi>h</mml:mi></mml:msub></mml:math> is the variation of h on its domain. This result establishes certain intrinsic connections between $$\theta $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>θ</mml:mi></mml:math> -logarithmically homogeneous barriers and the Frank–Wolfe method. When specialized to the D -optimal design problem, we essentially recover the complexity obtained by Khachiyan (Math Oper Res 21 (2): 307–320, 1996) using the Frank–Wolfe method with exact line-search. We also study the (Fenchel) dual problem of ( P ), and we show that our new method is equivalent to an adaptive-step-size mirror descent method applied to the dual problem. This enables us to provide iteration complexity bounds for the mirror descent method despite the fact that the dual objective function is non-Lipschitz and has unbounded domain. In addition, we present computational experiments that point to the potential usefulness of our generalized Frank–Wolfe method on Poisson image de-blurring problems with TV regularization, and on simulated PET problem instances.
Publisher OA PDF DOI
Using Taylor-Approximated Gradients to Improve the Frank-Wolfe Method for Empirical Risk Minimization
arXiv (Cornell University) · 2022-08-30 · 1 citations
preprintOpen accessSenior author
The Frank-Wolfe method has become increasingly useful in statistical and machine learning applications, due to the structure-inducing properties of the iterates, and especially in settings where linear minimization over the feasible set is more computationally efficient than projection. In the setting of Empirical Risk Minimization -- one of the fundamental optimization problems in statistical and machine learning -- the computational effectiveness of Frank-Wolfe methods typically grows linearly in the number of data observations $n$. This is in stark contrast to the case for typical stochastic projection methods. In order to reduce this dependence on $n$, we look to second-order smoothness of typical smooth loss functions (least squares loss and logistic loss, for example) and we propose amending the Frank-Wolfe method with Taylor series-approximated gradients, including variants for both deterministic and stochastic settings. Compared with current state-of-the-art methods in the regime where the optimality tolerance $\varepsilon$ is sufficiently small, our methods are able to simultaneously reduce the dependence on large $n$ while obtaining optimal convergence rates of Frank-Wolfe methods, in both the convex and non-convex settings. We also propose a novel adaptive step-size approach for which we have computational guarantees. Last of all, we present computational experiments which show that our methods exhibit very significant speed-ups over existing methods on real-world datasets for both convex and non-convex binary classification problems.
Publisher OA PDF DOI
Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
Mathematical Programming · 2020 · 19 citations
Senior authorCorresponding
- Computer Science
- Mathematics
- Mathematical optimization
Publisher DOI

Frequent coauthors

Paul Grigas
18 shared
Alexandre Belloni
15 shared
Han Men
American Institute of Aeronautics and Astronautics
14 shared
J. Peraire
Massachusetts Institute of Technology
13 shared
Rahul Mazumder
Massachusetts Institute of Technology
13 shared
Ngoc Cuong Nguyen
10 shared
Haihao Lu
10 shared
Fernando Ordóñez
8 shared

Awards & honors

MIT’s 2020 Seegal prize
INFORMS Fellow (2018)
Longuet-Higgins Prize in computer vision (2007)
Samuel M. Seegal Faculty Prize

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Robert M. Freund

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you