Vivek F. Farias

· Patrick J. McGovern (1959) ProfessorVerified

Massachusetts Institute of Technology · Operations Management

Active 2005–2026

h-index29

Citations3.8k

Papers10345 last 5y

Funding$872k

Faculty page Lab page

See your match with Vivek F. Farias — sign in to PhdFit.Sign in

About

Vivek Farias is a professor whose research and teaching focus on operations management, decision-making under uncertainty, and data-driven optimization. His work involves learning from commerce data, experimentation, control in online platforms, and large-scale optimization problems. He has mentored numerous students, many of whom have gone on to become assistant professors at leading universities or hold prominent roles in industry and research. His academic contributions include developing algorithms for large-scale personalization, revenue management, and fairness in operations, with recognition such as the INFORMS Dantzig Dissertation Award, Third Prize.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Political Science
Econometrics
Mathematics
Economics
Business
Geography
Medicine
Statistics
Operations research
Actuarial science
Engineering
Mathematical optimization
Chemistry
Chromatography
Nanotechnology
Algorithm
Environmental health
World Wide Web
Biochemistry
Materials science
Data science

Selected publications

The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler
ArXiv.org · 2026-05-21
articleOpen access
A central error measure in Gaussian DDPMs is the path-space KL divergence between the exact reverse chain and the learned Gaussian reverse process. This quantity is especially relevant for procedures such as classifier guidance, which perturb the entire reverse trajectory rather than only the terminal sample. Prior analyses show that standard isotropic reverse covariances suffer an unavoidable $Ω(1/T)$ path-KL error as the number of denoising steps $T$ grows. We show that matching the full posterior covariance breaks this barrier, yielding an order-wise improvement that reduces the path KL to $O(1/T^2)$. To make full covariance matching practical, we introduce the Lanczos Gaussian sampler (LGS), a training-free, matrix-free method for sampling from the optimal reverse covariance using only covariance-vector products, which are available through Jacobian-vector products of the posterior mean. LGS avoids dense covariance storage and auxiliary covariance models. We prove that LGS approximation error decays exponentially in the number of Lanczos steps, where each Lanczos step requires a single Jacobian-vector product. Empirically, using only just three such steps improves sample quality over strong diagonal-covariance baselines, including OCM-DDPM, across standard image benchmarks. This identifies full covariance matching as both theoretically valuable and practically accessible for fast DDPM sampling.
Publisher OA PDF
Misspecified Explore-then-Exploit Leads to Supra-Competitive Prices
ArXiv.org · 2026-05-15
articleOpen access
We study whether simple algorithmic pricing systems can systematically produce collusive-like prices in multi-firm markets. We consider firms using an explore-then-exploit pipeline: they randomize prices during an initial exploration phase, then estimate demand from their own historical data and set prices myopically thereafter. The estimation step relies on a misspecified, monopoly-style model that omits competitors' prices. We characterize when this pipeline converges to supra-competitive prices above the Nash equilibrium, via a fluid-limit ordinary differential equation analysis. We show that supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price. Moreover, prices can be substantially above the Nash price; we show that prices can reach monopoly levels under symmetric exploration. Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond our theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.
Publisher OA PDF
The Value of Covariance Matching in Gaussian DDPMs and the Lanczos Sampler
arXiv (Cornell University) · 2026-05-21
preprintOpen access
A central error measure in Gaussian DDPMs is the path-space KL divergence between the exact reverse chain and the learned Gaussian reverse process. This quantity is especially relevant for procedures such as classifier guidance, which perturb the entire reverse trajectory rather than only the terminal sample. Prior analyses show that standard isotropic reverse covariances suffer an unavoidable $Ω(1/T)$ path-KL error as the number of denoising steps $T$ grows. We show that matching the full posterior covariance breaks this barrier, yielding an order-wise improvement that reduces the path KL to $O(1/T^2)$. To make full covariance matching practical, we introduce the Lanczos Gaussian sampler (LGS), a training-free, matrix-free method for sampling from the optimal reverse covariance using only covariance-vector products, which are available through Jacobian-vector products of the posterior mean. LGS avoids dense covariance storage and auxiliary covariance models. We prove that LGS approximation error decays exponentially in the number of Lanczos steps, where each Lanczos step requires a single Jacobian-vector product. Empirically, using only just three such steps improves sample quality over strong diagonal-covariance baselines, including OCM-DDPM, across standard image benchmarks. This identifies full covariance matching as both theoretically valuable and practically accessible for fast DDPM sampling.
Publisher DOI
Misspecified Explore-then-Exploit Leads to Supra-Competitive Prices
arXiv (Cornell University) · 2026-05-15
preprintOpen access
We study whether simple algorithmic pricing systems can systematically produce collusive-like prices in multi-firm markets. We consider firms using an explore-then-exploit pipeline: they randomize prices during an initial exploration phase, then estimate demand from their own historical data and set prices myopically thereafter. The estimation step relies on a misspecified, monopoly-style model that omits competitors' prices. We characterize when this pipeline converges to supra-competitive prices above the Nash equilibrium, via a fluid-limit ordinary differential equation analysis. We show that supra-competitive prices arise when firms explore within similar price ranges on the same side of the Nash price. Moreover, prices can be substantially above the Nash price; we show that prices can reach monopoly levels under symmetric exploration. Simulations calibrated to a real multifamily rental market confirm that supra-competitive outcomes arise robustly beyond our theoretical assumptions, including under finite horizons, heterogeneous products, and nonlinear logit demand.
Publisher DOI
Policy Optimization for Personalized Interventions in Behavioral Health
Manufacturing & Service Operations Management · 2025-03-19 · 1 citations
article
Problem definition: Behavioral health interventions, delivered through digital platforms, have the potential to significantly improve health outcomes through education, motivation, reminders, and outreach. We study the problem of optimizing personalized interventions for patients to maximize a long-term outcome, in which interventions are costly and capacity constrained. We assume we have access to a historical data set collected from an initial pilot study. Methodology/results: We present a new approach for this problem that we dub [Formula: see text], which decomposes the state space for a system of patients to the individual level and then approximates one step of policy iteration. Implementing [Formula: see text] simply consists of a prediction task using the data set, alleviating the need for online experimentation. [Formula: see text] is a generic, model-free algorithm that can be used irrespective of the underlying patient behavior model. We derive theoretical guarantees on a simple, special case of the model that is representative of our problem setting. When the initial policy used to collect the data is randomized, we establish an approximation guarantee for [Formula: see text] with respect to the improvement beyond a null policy that does not allocate interventions. We show that this guarantee is robust to estimation errors. We then conduct a rigorous empirical case study using real-world data from a mobile health platform for improving treatment adherence for tuberculosis. Using a validated simulation model, we demonstrate that [Formula: see text] can provide the same efficacy as the status quo approach with approximately half the capacity of interventions. Managerial implications: [Formula: see text] is simple and easy to implement for an organization aiming to improve long-term behavior through targeted interventions, and this paper demonstrates its strong performance both theoretically and empirically, particularly in resource-limited settings. Funding: The authors are grateful for financial research support from the MIT Sloan Health Systems Initiative. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.0548 .
Publisher DOI
E-GEO: A Testbed for Generative Engine Optimization in E-Commerce
ArXiv.org · 2025-11-25 · 2 citations
preprintOpen access
With the rise of large language models (LLMs), generative engines are becoming powerful alternatives to traditional search, reshaping retrieval tasks. In e-commerce, for instance, conversational shopping agents now guide consumers to relevant products. This shift has created the need for generative engine optimization (GEO)--improving content visibility and relevance for generative engines. Yet despite its growing importance, current GEO practices are ad hoc, and their impacts remain poorly understood, especially in e-commerce. We address this gap by introducing E-GEO, the first benchmark built specifically for e-commerce GEO. E-GEO contains over 7,000 realistic, multi-sentence consumer product queries paired with relevant listings, capturing rich intent, constraints, preferences, and shopping contexts that existing datasets largely miss. Using this benchmark, we conduct the first large-scale empirical study of e-commerce GEO, evaluating 15 common rewriting heuristics and comparing their empirical performance. To move beyond heuristics, we further formulate GEO as a tractable optimization problem and develop a lightweight iterative prompt-optimization algorithm that can significantly outperform these baselines. Surprisingly, the optimized prompts reveal a stable, domain-agnostic pattern--suggesting the existence of a "universally effective" GEO strategy. Our data and code are publicly available at https://github.com/psbagga17/E-GEO.
Publisher OA PDF DOI
The Limits to Learning a Diffusion Model
Management Science · 2025-04-18 · 1 citations
article
This paper provides the first sample complexity lower bounds for the estimation of simple diffusion models, including the Bass model (used in modeling consumer adoption) and the Susceptible-Infected-Recovered (SIR) model (used in modeling epidemics). We show that one cannot hope to learn such models until quite late in the diffusion. Specifically, we show that the time required to collect a number of observations that exceeds our sample complexity lower bounds is large. For the Bass model, our results imply that when new adopters are predominantly driven by imitation, one cannot hope to predict the eventual number of adopting customers until one is at least two-thirds of the way to the time at which the rate of new adopters is at its peak. In a similar vein, our results imply that in the case of an SIR model, one cannot hope to predict the eventual number of infections until one is approximately two-thirds of the way to the time at which the infection rate has peaked. This lower bound in estimation further translates into a lower bound in regret for decision making in epidemic interventions. Our results formalize the challenge of accurate forecasting and highlight the importance of incorporating additional data sources. To this end, we analyze the benefit of a seroprevalence study in an epidemic, where we characterize the size of the study needed to improve SIR model estimation. Extensive empirical analyses on product adoption and epidemic data support our theoretical findings. This paper was accepted by David Simchi-Levi, data science. Funding: This work was supported by the Division of Civil, Mechanical and Manufacturing Innovation [Grant 1727239]. Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2022.02953 .
Publisher DOI
Generative AI as a New Platform for Applications Development
2024 · 4 citations
- Computer Science
- Computer Science
- Artificial Intelligence
Generative AI (GenAI) is rapidly emerging as a powerful new platform for software developmentâa foundational technology enabling a wide variety of applications, like past platforms such as computers, smartphones, and cloud services. Major players include producers of large language models (LLMs) like OpenAI, Google, and Meta; hardware and software providers like Nvidia; and cloud infrastructure providers like Amazon, Microsoft, and Google. An ecosystem is evolving with infrastructure layers, foundation LLM models, an array of tools and frameworks, and a rapidly growing set of applications spanning horizontal products for broad usage as well as customized vertical solutions. Key issues going forward include potential market concentration, data privacy and ownership concerns, the accuracy and reliability of AI-generated content, regulation versus self-governance challenges, disruption to jobs and industries, and significant environmental impacts from increased energy consumption. As capabilities and adoption of this new AI technology expand, companies, universities, governments, and technology experts must think carefully and collaboratively about the costs, benefits, trade-offs, and potential dangers of GenAI as a new applications platform.
Publisher OA PDF DOI
Speeding up Policy Simulation in Supply Chain RL
arXiv (Cornell University) · 2024-06-04
preprintOpen access1st authorCorresponding
Simulating a single trajectory of a dynamical system under some state-dependent policy is a core bottleneck in policy optimization (PO) algorithms. The many inherently serial policy evaluations that must be performed in a single simulation constitute the bulk of this bottleneck. In applying PO to supply chain optimization (SCO) problems, simulating a single sample path corresponding to one month of a supply chain can take several hours. We present an iterative algorithm to accelerate policy simulation, dubbed Picard Iteration. This scheme carefully assigns policy evaluation tasks to independent processes. Within an iteration, any given process evaluates the policy only on its assigned tasks while assuming a certain "cached" evaluation for other tasks; the cache is updated at the end of the iteration. Implemented on GPUs, this scheme admits batched evaluation of the policy across a single trajectory. We prove that the structure afforded by many SCO problems allows convergence in a small number of iterations independent of the horizon. We demonstrate practical speedups of 400x on large-scale SCO problems even with a single GPU, and also demonstrate practical efficacy in other RL environments.
Publisher OA PDF DOI
Fixing Inventory Inaccuracies at Scale
Manufacturing & Service Operations Management · 2024-03-14 · 6 citations
article1st authorCorresponding
Problem definition: Inaccurate records of inventory occur frequently and, by some measures, cost retailers approximately 4% in annual sales. Detecting inventory inaccuracies manually is cost-prohibitive, and existing algorithmic solutions rely almost exclusively on learning from longitudinal data, which is insufficient in the dynamic environment induced by modern retail operations. Instead, we propose a solution based on cross-sectional data over stores and stock-keeping units (SKUs), viewing inventory inaccuracies as a problem of identifying anomalies in a (low-rank) Poisson matrix. State-of-the-art approaches to anomaly detection in low-rank matrices apparently fall short. Specifically, from a theoretical perspective, recovery guarantees for these approaches require that nonanomalous entries be observed with vanishingly small noise (which is not the case in our problem and, indeed, in many applications). Methodology/results: So motivated, we propose a conceptually simple entrywise approach to anomaly detection in low-rank Poisson matrices. Our approach accommodates a general class of probabilistic anomaly models. We show that the cost incurred by our algorithm approaches that of an optimal algorithm at a min-max optimal rate. Using synthetic data and real data from a consumer goods retailer, we show that our approach provides up to a 10× cost reduction over incumbent approaches to anomaly detection. Along the way, we build on recent work that seeks entrywise error guarantees for matrix completion, establishing such guarantees for subexponential matrices, a result of independent interest. Managerial implications: By utilizing cross-sectional data at scale, our novel approach provides a practical solution to the issue of inventory inaccuracies in retail operations. Our method is cost-effective and can help managers detect inventory inaccuracies quickly, leading to increased sales and improved customer satisfaction. In addition, the entrywise error guarantees that we establish are of interest to academics working on matrix-completion problems. History: This paper was selected for Fast Track in M&SOM from the 2022 MSOM Supply Chain Management SIG Conference. Funding: Financial support from the National Science Foundation Division of Civil, Mechanical, and Manufacturing Innovation [Grant CMMI 1727239] is gratefully acknowledged. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.0146 .
Publisher DOI

Recent grants

An Optimization Framework for Dynamic A-B Testing
NSF · $472k · 2017–2022
CAREER: Large Scale Stochastic Control: A Math Programming and Discrete Optimization Lens
NSF · $400k · 2011–2017

Frequent coauthors

Tianyi Peng
Zhejiang Sci-Tech University
19 shared
Devavrat Shah
18 shared
Srikanth Jagabathula
14 shared
Andrew A. Li
Carnegie Mellon University
14 shared
Ciamac C. Moallemi
13 shared
Andrew Zheng
12 shared
Deeksha Sinha
Massachusetts Institute of Technology
12 shared
Retsef Levi
12 shared

Labs

Vivek F. Farias LabPI
Not provided

Awards & honors

INFORMS Fellow (2025)
Pierskalla Best Paper Award from the Health Applications Soc…
Daniel H. Wagner Prize for Excellence in the Practice of Adv…
Institute for Operations Research and the Management Science…
INFORMS MSOM Best Publication Award in Management Science (2…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Vivek F. Farias

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you