Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Benjamin Van Roy

Benjamin Van Roy

· Professor of Electrical Engineering, of Management Science and Engineering and, by courtesy, of Computer ScienceVerified

Stanford University · Management Science and Engineering

Active 1995–2026

h-index51
Citations12.7k
Papers25566 last 5y
Funding$1.2M
See your match with Benjamin Van Roy — sign in to PhdFit.Sign in

About

Benjamin Van Roy is a Professor of Electrical Engineering, of Management Science and Engineering, and by courtesy, of Computer Science at Stanford University. His research focuses on areas related to electrical engineering, management science, and computer science, contributing to the academic community through his interdisciplinary expertise. As a faculty member, he is involved in advancing knowledge and education in these fields, although specific details about his research focus, background, and key contributions are not provided in the page text.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Data Mining
  • Statistics
  • Mathematical optimization
  • Mathematics

Selected publications

  • Prior Diffusiveness and Regret in the Linear-Gaussian Bandit

    arXiv (Cornell University) · 2026-01-05

    preprintOpen accessSenior author

    We prove that Thompson sampling exhibits $\tilde{O}(σd \sqrt{T} + d r \sqrt{\mathrm{Tr}(Σ_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(μ_0, Σ_0)$ prior distribution on the coefficients, where $d$ is the dimension, $T$ is the time horizon, $r$ is the maximum $\ell_2$ norm of the actions, and $σ^2$ is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in'' term $d r \sqrt{\mathrm{Tr}(Σ_0)}$ decouples additively from the minimax (long run) regret $σd \sqrt{T}$. Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.

  • Prior Diffusiveness and Regret in the Linear-Gaussian Bandit

    ArXiv.org · 2026-01-05

    articleOpen accessSenior author

    We prove that Thompson sampling exhibits $\tilde{O}(σd \sqrt{T} + d r \sqrt{\mathrm{Tr}(Σ_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(μ_0, Σ_0)$ prior distribution on the coefficients, where $d$ is the dimension, $T$ is the time horizon, $r$ is the maximum $\ell_2$ norm of the actions, and $σ^2$ is the noise variance. In contrast to existing regret bounds, this shows that to within logarithmic factors, the prior-dependent ``burn-in'' term $d r \sqrt{\mathrm{Tr}(Σ_0)}$ decouples additively from the minimax (long run) regret $σd \sqrt{T}$. Previous regret bounds exhibit a multiplicative dependence on these terms. We establish these results via a new ``elliptical potential'' lemma, and also provide a lower bound indicating that the burn-in term is unavoidable.

  • Misalignment from Treating Means as Ends

    Proceedings of the AAAI Conference on Artificial Intelligence · 2026-03-14

    articleOpen accessSenior author

    Reward functions, learned or manually specified, are rarely perfect. Instead of accurately expressing human goals, these reward functions are often distorted by human beliefs about how best to achieve those goals. Specifically, these reward functions often express a combination of the human's terminal goals — those which are ends in themselves — and the human's instrumental goals — those which are means to an end. We formulate a simple example in which even slight conflation of instrumental and terminal goals results in severe misalignment: optimizing the misspecified reward function r̂ results in poor performance when measured by the true reward function r. This example distills the essential properties of environments that make reinforcement learning highly sensitive to conflation of instrumental and terminal goals. We discuss how this issue can arise with a common approach to reward learning and how it can manifest in real environments.

  • Continual Learning as Computationally Constrained Reinforcement Learning

    Foundations and Trends® in Machine Learning · 2025-08-20

    articleSenior author

    An agent that accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and tools to stimulate further research. We also present a range of empirical case studies to illustrate the roles of forgetting, relearning, exploration, and auxiliary learning. Metrics presented in previous literature for evaluating continual learning agents tend to focus on particular behaviors that are deemed desirable, such as avoiding catastrophic forgetting, retaining plasticity, relearning quickly, and maintaining low memory or compute footprints. In order to systematically reason about design choices and compare agents, a coherent, holistic objective that encompasses all such requirements would be helpful. To provide such an objective, we cast continual learning as reinforcement learning with limited compute resources. In particular, we pose the continual learning objective to be the maximization of infinite-horizon average reward subject to a computational constraint. Continual supervised learning, for example, is a special case of our general formulation where the reward is taken to be negative log-loss or accuracy. Among the implications of maximizing average reward are that remembering all information from the past is unnecessary, forgetting nonrecurring information is not “catastrophic,” and learning about how an environment changes over time is useful. Computational constraints give rise to informational constraints in the sense that they limit the amount of information used to make decisions. A consequence is that, unlike in more common framings of machine learning in which per-timestep regret vanishes as an agent accumulates information, the regret experienced in continual learning typically persists. Related to this is that even in stationary environments, informational constraints can incentivize perpetual adaptation. Informational constraints also give rise to the familiar stability-plasticity dilemma, which we formalize in information-theoretic terms.

  • A Systematic Review of Plant Leaf Disease Identification Using Image Search Engines

    Auerbach Publications eBooks · 2025-03-05

    review

    The advent of digital cameras and mobile devices with built-in cameras has led to an explosion in the number of digital images available today. With this vast amount of visual data, efficient and effective methods for image retrieval and analysis are becoming increasingly important. Image search is a fundamental task in computer vision and has various applications, including healthcare, security, and social media.

  • Misalignment from Treating Means as Ends

    ArXiv.org · 2025-07-15

    preprintOpen accessSenior author

    Reward functions, learned or manually specified, are rarely perfect. Instead of accurately expressing human goals, these reward functions are often distorted by human beliefs about how best to achieve those goals. Specifically, these reward functions often express a combination of the human's terminal goals -- those which are ends in themselves -- and the human's instrumental goals -- those which are means to an end. We formulate a simple example in which even slight conflation of instrumental and terminal goals results in severe misalignment: optimizing the misspecified reward function results in poor performance when measured by the true reward function. This example distills the essential properties of environments that make reinforcement learning highly sensitive to conflation of instrumental and terminal goals. We discuss how this issue can arise with a common approach to reward learning and how it can manifest in real environments.

  • Granular feedback merits sophisticated aggregation

    ArXiv.org · 2025-07-16

    preprintOpen accessSenior author

    Human feedback is increasingly used across diverse applications like training AI models, developing recommender systems, and measuring public opinion -- with granular feedback often being preferred over binary feedback for its greater informativeness. While it is easy to accurately estimate a population's distribution of feedback given feedback from a large number of individuals, cost constraints typically necessitate using smaller groups. A simple method to approximate the population distribution is regularized averaging: compute the empirical distribution and regularize it toward a prior. Can we do better? As we will discuss, the answer to this question depends on feedback granularity. Suppose one wants to predict a population's distribution of feedback using feedback from a limited number of individuals. We show that, as feedback granularity increases, one can substantially improve upon predictions of regularized averaging by combining individuals' feedback in ways more sophisticated than regularized averaging. Our empirical analysis using questions on social attitudes confirms this pattern. In particular, with binary feedback, sophistication barely reduces the number of individuals required to attain a fixed level of performance. By contrast, with five-point feedback, sophisticated methods match the performance of regularized averaging with about half as many individuals.

  • The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

    arXiv (Cornell University) · 2024-08-06

    preprintOpen accessSenior author

    The "small agent, big world" frame offers a conceptual view that motivates the need for continual learning. The idea is that a small agent operating in a much bigger world cannot store all information that the world has to offer. To perform well, the agent must be carefully designed to ingest, retain, and eject the right information. To enable the development of performant continual learning agents, a number of synthetic environments have been proposed. However, these benchmarks suffer from limitations, including unnatural distribution shifts and a lack of fidelity to the "small agent, big world" framing. This paper aims to formalize two desiderata for the design of future simulated environments. These two criteria aim to reflect the objectives and complexity of continual learning in practical settings while enabling rapid prototyping of algorithms on a smaller scale.

  • Information-Theoretic Foundations for Neural Scaling Laws

    arXiv (Cornell University) · 2024-06-28

    preprintOpen accessSenior author

    Neural scaling laws aim to characterize how out-of-sample error behaves as a function of model and training dataset size. Such scaling laws guide allocation of a computational resources between model and data processing to minimize error. However, existing theoretical support for neural scaling laws lacks rigor and clarity, entangling the roles of information and optimization. In this work, we develop rigorous information-theoretic foundations for neural scaling laws. This allows us to characterize scaling laws for data generated by a two-layer neural network of infinite width. We observe that the optimal relation between data and model size is linear, up to logarithmic factors, corroborating large-scale empirical investigations. Concise yet general results of the kind we establish may bring clarity to this topic and inform future investigations.

  • Information-Theoretic Foundations for Machine Learning

    arXiv (Cornell University) · 2024-07-17 · 2 citations

    preprintOpen accessSenior author

    The progress of machine learning over the past decade is undeniable. In retrospect, it is both remarkable and unsettling that this progress was achievable with little to no rigorous theory to guide experimentation. Despite this fact, practitioners have been able to guide their future experimentation via observations from previous large-scale empirical investigations. In this work, we propose a theoretical framework which attempts to provide rigor to existing practices in machine learning. To the theorist, we provide a framework which is mathematically rigorous and leaves open many interesting ideas for future exploration. To the practitioner, we provide a framework whose results are simple, and provide intuition to guide future investigations across a wide range of learning paradigms. Concretely, we provide a theoretical framework rooted in Bayesian statistics and Shannon's information theory which is general enough to unify the analysis of many phenomena in machine learning. Our framework characterizes the performance of an optimal Bayesian learner as it learns from a stream of experience. Unlike existing analyses that weaken with increasing data complexity, our theoretical tools provide accurate insights across diverse machine learning settings. Throughout this work, we derive theoretical results and demonstrate their generality by apply them to derive insights specific to settings. These settings range from learning from data which is independently and identically distributed under an unknown distribution, to data which is sequential, to data which exhibits hierarchical structure amenable to meta-learning, and finally to data which is not fully explainable under the learner's beliefs (misspecification). These results are particularly relevant as we strive to understand and overcome increasingly difficult machine learning challenges in this endlessly complex world.

Recent grants

Frequent coauthors

  • Ian Osband

    53 shared
  • John N. Tsitsiklis

    Decision Systems (United States)

    33 shared
  • Zheng Wen

    21 shared
  • Gabriel Y. Weintraub

    20 shared
  • C. Lanier Benkard

    18 shared
  • Daniel Russo

    18 shared
  • Vikranth Dwaracherla

    16 shared
  • Ciamac C. Moallemi

    16 shared

Education

  • Ph.D., Electrical Engineering

    Stanford University

    1990
  • M.S., Electrical Engineering

    Stanford University

    1985
  • B.S., Electrical Engineering

    Stanford University

    1981

Awards & honors

  • MIT George C. Newton Undergraduate Laboratory Project Award
  • MIT Morris J. Levin Memorial Master's Thesis Award
  • MIT George M. Sprowls Doctoral Dissertation Award
  • National Science Foundation CAREER Award
  • Stanford Tau Beta Pi Award for Excellence in Undergraduate T…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Benjamin Van Roy

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup