Aaron Roth

· ProfessorVerified

University of Pennsylvania · Computer and Information Science

Active 1999–2025

h-index54

Citations17.7k

Papers381115 last 5y

Funding$5.2M

Faculty page

See your match with Aaron Roth — sign in to PhdFit.Sign in

About

Aaron Roth is a professor whose research focuses on privacy, fairness, and security in computer science. His work involves developing algorithms and systems that ensure data privacy and equitable treatment in computational processes. Roth has contributed to the understanding of how to design algorithms that respect user privacy while maintaining utility, and he has worked on issues related to algorithmic fairness and security. His background includes extensive research in theoretical computer science, with a particular emphasis on the intersection of privacy and machine learning. Roth's key contributions include advancing the theoretical foundations of differential privacy and exploring its applications in real-world systems. His research aims to create practical solutions that balance data utility with privacy guarantees, and to address ethical concerns related to algorithmic decision-making.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Sociology
Algorithm
Data Mining
Political Science
Operating system
Pedagogy
Business
Psychology
Economics
Actuarial science
Theoretical computer science
Medicine
Mathematics
Demography
Programming language
Mathematics education
Econometrics
Database

Selected publications

Stronger Neyman Regret Guarantees for Adaptive Experimental Design
ArXiv.org · 2025-02-24
preprintOpen accessSenior author
We study the design of adaptive, sequential experiments for unbiased average treatment effect (ATE) estimation in the design-based potential outcomes setting. Our goal is to develop adaptive designs offering sublinear Neyman regret, meaning their efficiency must approach that of the hindsight-optimal nonadaptive design. Recent work [Dai et al, 2023] introduced ClipOGD, the first method achieving $\widetilde{O}(\sqrt{T})$ expected Neyman regret under mild conditions. In this work, we propose adaptive designs with substantially stronger Neyman regret guarantees. In particular, we modify ClipOGD to obtain anytime $\widetilde{O}(\log T)$ Neyman regret under natural boundedness assumptions. Further, in the setting where experimental units have pre-treatment covariates, we introduce and study a class of contextual "multigroup" Neyman regret guarantees: Given any set of possibly overlapping groups based on the covariates, the adaptive design outperforms each group's best non-adaptive designs. In particular, we develop a contextual adaptive design with $\widetilde{O}(\sqrt{T})$ anytime multigroup Neyman regret. We empirically validate the proposed designs through an array of experiments.
Publisher OA PDF DOI
Sample Efficient Omniprediction and Downstream Swap Regret for Non-Linear Losses
ArXiv.org · 2025-02-18
preprintOpen access
We define "decision swap regret" which generalizes both prediction for downstream swap regret and omniprediction, and give algorithms for obtaining it for arbitrary multi-dimensional Lipschitz loss functions in online adversarial settings. We also give sample complexity bounds in the batch setting via an online-to-batch reduction. When applied to omniprediction, our algorithm gives the first polynomial sample-complexity bounds for Lipschitz loss functions -- prior bounds either applied only to linear loss (or binary outcomes) or scaled exponentially with the error parameter even under the assumption that the loss functions were convex. When applied to prediction for downstream regret, we give the first algorithm capable of guaranteeing swap regret bounds for all downstream agents with non-linear loss functions over a multi-dimensional outcome space: prior work applied only to linear loss functions, modeling risk neutral agents. Our general bounds scale exponentially with the dimension of the outcome space, but we give improved regret and sample complexity bounds for specific families of multidimensional functions of economic interest: constant elasticity of substitution (CES), Cobb-Douglas, and Leontief utility functions.
Publisher OA PDF DOI
The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review
Journal of the American Statistical Association · 2025-06-02 · 2 citations
article
Publisher DOI
Networked Information Aggregation via Machine Learning
ArXiv.org · 2025-07-13
preprintOpen access
We study a distributed learning problem in which learning agents are embedded in a directed acyclic graph (DAG). There is a fixed and arbitrary distribution over feature/label pairs, and each agent or vertex in the graph is able to directly observe only a subset of the features -- potentially a different subset for every agent. The agents learn sequentially in some order consistent with a topological sort of the DAG, committing to a model mapping observations to predictions of the real-valued label. Each agent observes the predictions of their parents in the DAG, and trains their model using both the features of the instance that they directly observe, and the predictions of their parents as additional features. We ask when this process is sufficient to achieve \emph{information aggregation}, in the sense that some agent in the DAG is able to learn a model whose error is competitive with the best model that could have been learned (in some hypothesis class) with direct access to \emph{all} features, despite the fact that no single agent in the network has such access. We give upper and lower bounds for this problem for both linear and general hypothesis classes. Our results identify the \emph{depth} of the DAG as the key parameter: information aggregation can occur over sufficiently long paths in the DAG, assuming that all of the relevant features are well represented along the path, and there are distributions over which information aggregation cannot occur even in the linear case, and even in arbitrarily large DAGs that do not have sufficient depth (such as a hub-and-spokes topology in which the spoke vertices collectively see all the features). We complement our theoretical results with a comprehensive set of experiments.
Publisher OA PDF DOI
Decision Theoretic Foundations for Conformal Prediction: Optimal Uncertainty Quantification for Risk-Averse Agents
ArXiv.org · 2025-02-04
preprintOpen access
A fundamental question in data-driven decision making is how to quantify the uncertainty of predictions in ways that can usefully inform downstream action. This interface between prediction uncertainty and decision-making is especially important in risk-sensitive domains, such as medicine. In this paper, we develop decision-theoretic foundations that connect uncertainty quantification using prediction sets with risk-averse decision-making. Specifically, we answer three fundamental questions: (1) What is the correct notion of uncertainty quantification for risk-averse decision makers? We prove that prediction sets are optimal for decision makers who wish to optimize their value at risk. (2) What is the optimal policy that a risk averse decision maker should use to map prediction sets to actions? We show that a simple max-min decision policy is optimal for risk-averse decision makers. Finally, (3) How can we derive prediction sets that are optimal for such decision makers? We provide an exact characterization in the population regime and a distribution free finite-sample construction. Answering these questions naturally leads to an algorithm, Risk-Averse Calibration (RAC), which follows a provably optimal design for deriving action policies from predictions. RAC is designed to be both practical-capable of leveraging the quality of predictions in a black-box manner to enhance downstream utility-and safe-adhering to a user-defined risk threshold and optimizing the corresponding risk quantile of the user's downstream utility. Finally, we experimentally demonstrate the significant advantages of RAC in applications such as medical diagnosis and recommendation systems. Specifically, we show that RAC achieves a substantially improved trade-off between safety and utility, offering higher utility compared to existing methods while maintaining the safety guarantee.
Publisher OA PDF DOI
Resolving the Reference Class Problem at Scale
Philosophy of Science · 2025-04-14 · 1 citations
articleOpen access1st authorCorresponding
Abstract We draw a distinction between the traditional reference class problem, which describes an obstruction to estimating a single individual probability—which we rename the individual reference class problem —and what we call the reference class problem at scale , which can result when using tools from statistics and machine learning to systematically make predictions about many individual probabilities simultaneously. We argue that scale actually helps to mitigate the reference class problem, and purely statistical tools can be used to efficiently minimize the reference class problem at scale, even though they cannot be used to solve the individual reference class problem.
Publisher OA PDF DOI
Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces
ArXiv.org · 2025-02-17
preprintOpen access
In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.
Publisher OA PDF DOI
The Value of Ambiguous Commitments in Multi-Follower Games
SSRN Electronic Journal · 2025-01-01
preprintOpen accessSenior author
Publisher DOI
Replicable Reinforcement Learning with Linear Function Approximation
ArXiv.org · 2025-09-10
preprintOpen access
Replication of experimental results has been a challenge faced by many scientific disciplines, including the field of machine learning. Recent work on the theory of machine learning has formalized replicability as the demand that an algorithm produce identical outcomes when executed twice on different samples from the same distribution. Provably replicable algorithms are especially interesting for reinforcement learning (RL), where algorithms are known to be unstable in practice. While replicable algorithms exist for tabular RL settings, extending these guarantees to more practical function approximation settings has remained an open problem. In this work, we make progress by developing replicable methods for linear function approximation in RL. We first introduce two efficient algorithms for replicable random design regression and uncentered covariance estimation, each of independent interest. We then leverage these tools to provide the first provably efficient replicable RL algorithms for linear Markov decision processes in both the generative model and episodic settings. Finally, we evaluate our algorithms experimentally and show how they can inspire more consistent neural policies.
Publisher OA PDF DOI
Collaborative Prediction: Tractable Information Aggregation via Agreement
ArXiv.org · 2025-04-08
preprintOpen access
We give efficient "collaboration protocols" through which two parties, who observe different features about the same instances, can interact to arrive at predictions that are more accurate than either could have obtained on their own. The parties only need to iteratively share and update their own label predictions-without either party ever having to share the actual features that they observe. Our protocols are efficient reductions to the problem of learning on each party's feature space alone, and so can be used even in settings in which each party's feature space is illegible to the other-which arises in models of human/AI interaction and in multi-modal learning. The communication requirements of our protocols are independent of the dimensionality of the data. In an online adversarial setting we show how to give regret bounds on the predictions that the parties arrive at with respect to a class of benchmark policies defined on the joint feature space of the two parties, despite the fact that neither party has access to this joint feature space. We also give simpler algorithms for the same task in the batch setting in which we assume that there is a fixed but unknown data distribution. We generalize our protocols to a decision theoretic setting with high dimensional outcome spaces, where parties communicate only "best response actions." Our theorems give a computationally and statistically tractable generalization of past work on information aggregation amongst Bayesians who share a common and correct prior, as part of a literature studying "agreement" in the style of Aumann's agreement theorem. Our results require no knowledge of (or even the existence of) a prior distribution and are computationally efficient. Nevertheless we show how to lift our theorems back to this classical Bayesian setting, and in doing so, give new information aggregation theorems for Bayesian agreement.
Publisher OA PDF DOI

Recent grants

CAREER: Correctness-Performance Partitioned (CPP) Architectures
NSF · $400k · 2003–2009
FAI: Breaking the Tradeoff Barrier in Algorithmic Fairness
NSF · $393k · 2022–2025
CAREER: The Algorithmic Foundations of Data Privacy
NSF · $484k · 2013–2020
ICES: Large: Economic Foundations of Digital Privacy
NSF · $998k · 2011–2016
TWC: Medium: Distributed Differential Privacy
NSF · $1.2M · 2015–2021

Frequent coauthors

Michael Kearns
108 shared
Zhiwei Steven Wu
69 shared
Seth Neel
44 shared
Jonathan Ullman
Northeastern University
41 shared
Katrina Ligett
Hebrew University of Jerusalem
38 shared
Mallesh M. Pai
30 shared
Jamie Morgenstern
University of Washington
30 shared
Sampath Kannan
29 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Aaron Roth

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you