Sampath Kannan

· Professor

University of Pennsylvania · Computer and Information Science

Active 1988–2025

h-index33

Citations4.5k

Papers15018 last 5y

Funding$610k

Faculty page

OpenAlex

See your match with Sampath Kannan — sign in to PhdFit.Sign in

Research topics

Computer Science
Political Science
Sociology
Artificial Intelligence
Demography
Actuarial science
Mathematical optimization
Mathematics education
Medicine
Econometrics
Mathematics
Combinatorics
Business
Psychology
Engineering
Economics
Algorithm
Pedagogy

Selected publications

Nearly Tight Bounds on Testing of Metric Properties
Society for Industrial and Applied Mathematics eBooks · 2025-01-01
book-chapterOpen access
Given a non-negative n × n matrix viewed as a set of distances between n points, we consider the property testing problem of deciding if it is a metric. We also consider the same problem for two special classes of metrics — tree metrics and ultrametrics. For general metrics, our paper is the first to consider these questions. We prove an upper bound of O (n 2/3/ ε 4/3) on the query complexity for this problem. Our algorithm is simple, but the analysis requires great care in bounding the variance on the number of violating triangles in a sample. When ε is a slowly decreasing function of n (rather than a constant, as is standard), we prove a lower bound of matching dependence on n of Ω(n2/3), ruling out any property testers with o (n2/3) query complexity unless their dependence on 1/ε is super-polynomial.
Publisher OA PDF DOI
Algorithmic Collusion Without Threats
arXiv (Cornell University) · 2024-09-06 · 2 citations
preprintOpen access
There has been substantial recent concern that pricing algorithms might learn to ``collude.'' Supra-competitive prices can emerge as a Nash equilibrium of repeated pricing games, in which sellers play strategies which threaten to punish their competitors who refuse to support high prices, and these strategies can be automatically learned. In fact, a standard economic intuition is that supra-competitive prices emerge from either the use of threats, or a failure of one party to optimize their payoff. Is this intuition correct? Would preventing threats in algorithmic decision-making prevent supra-competitive prices when sellers are optimizing for their own revenue? No. We show that supra-competitive prices can emerge even when both players are using algorithms which do not encode threats, and which optimize for their own revenue. We study sequential pricing games in which a first mover deploys an algorithm and then a second mover optimizes within the resulting environment. We show that if the first mover deploys any algorithm with a no-regret guarantee, and then the second mover even approximately optimizes within this now static environment, monopoly-like prices arise. The result holds for any no-regret learning algorithm deployed by the first mover and for any pricing policy of the second mover that obtains them profit at least as high as a random pricing would -- and hence the result applies even when the second mover is optimizing only within a space of non-responsive pricing distributions which are incapable of encoding threats. In fact, there exists a set of strategies, neither of which explicitly encode threats that form a Nash equilibrium of the simultaneous pricing game in algorithm space, and lead to near monopoly prices. This suggests that the definition of ``algorithmic collusion'' may need to be expanded, to include strategies without explicitly encoded threats.
Publisher OA PDF DOI
Nearly Tight Bounds on Testing of Metric Properties
arXiv (Cornell University) · 2024-11-13
preprintOpen access
Given a non-negative $n \times n$ matrix viewed as a set of distances between $n$ points, we consider the property testing problem of deciding if it is a metric. We also consider the same problem for two special classes of metrics, tree metrics and ultrametrics. For general metrics, our paper is the first to consider these questions. We prove an upper bound of $O(n^{2/3}/ε^{4/3})$ on the query complexity for this problem. Our algorithm is simple, but the analysis requires great care in bounding the variance on the number of violating triangles in a sample. When $ε$ is a slowly decreasing function of $n$ (rather than a constant, as is standard), we prove a lower bound of matching dependence on $n$ of $Ω(n^{2/3})$, ruling out any property testers with $o(n^{2/3})$ query complexity unless their dependence on $1/ε$ is super-polynomial. Next, we turn to tree metrics and ultrametrics. While there were known upper and lower bounds, we considerably improve these bounds showing essentially tight bounds of $\tilde{O}(1/ε)$ on the sample complexity. We also show a lower bound of $Ω( 1/ε^{4/3} )$ on the query complexity. Our upper bounds are derived by doing a more careful analysis of a natural, simple algorithm. For the lower bounds, we construct distributions on NO instances, where it is hard to find a witness showing that these are not ultrametrics.
Publisher OA PDF DOI
Oracle Efficient Algorithms for Groupwise Regret
arXiv (Cornell University) · 2023-10-07
preprintOpen access
We study the problem of online prediction, in which at each time step $t$, an individual $x_t$ arrives, whose label we must predict. Each individual is associated with various groups, defined based on their features such as age, sex, race etc., which may intersect. Our goal is to make predictions that have regret guarantees not just overall but also simultaneously on each sub-sequence comprised of the members of any single group. Previous work such as [Blum & Lykouris] and [Lee et al] provide attractive regret guarantees for these problems; however, these are computationally intractable on large model classes. We show that a simple modification of the sleeping experts technique of [Blum & Lykouris] yields an efficient reduction to the well-understood problem of obtaining diminishing external regret absent group considerations. Our approach gives similar regret guarantees compared to [Blum & Lykouris]; however, we run in time linear in the number of groups, and are oracle-efficient in the hypothesis class. This in particular implies that our algorithm is efficient whenever the number of groups is polynomially bounded and the external-regret problem can be solved efficiently, an improvement on [Blum & Lykouris]'s stronger condition that the model class must be small. Our approach can handle online linear regression and online combinatorial optimization problems like online shortest paths. Beyond providing theoretical regret bounds, we evaluate this algorithm with an extensive set of experiments on synthetic data and on two real data sets -- Medical costs and the Adult income dataset, both instantiated with intersecting groups defined in terms of race, sex, and other demographic characteristics. We find that uniformly across groups, our algorithm gives substantial error improvements compared to running a standard online linear regression algorithm with no groupwise regret guarantees.
Publisher OA PDF DOI
Wealth Dynamics Over Generations: Analysis and Interventions
2023-02-01 · 1 citations
article
We present a stylized model with feedback loops for the evolution of a population's wealth over generations. Individuals have both talent and wealth: talent is a random variable distributed identically for everyone, but wealth is a random variable that is dependent on the population one is born into. Individuals then apply to a downstream agent, which we treat as a university throughout the paper (but could also represent an employer) who makes a decision about whether to admit them or not. The university does not directly observe talent or wealth, but rather a signal (representing e.g. a standardized test) that is a convex combination of both. The university knows the distributions from which an individual's type and wealth are drawn, and makes its decisions based on the posterior distribution of the applicant's characteristics conditional on their population and signal. Each population's wealth distribution at the next round then depends on the fraction of that population that was admitted by the university at the previous round. We study wealth dynamics in this model, and give conditions under which the dynamics have a single attracting fixed point (which implies population wealth inequality is transitory), and conditions under which it can have multiple attracting fixed points (which implies that population wealth inequality can be persistent). In the case in which there are multiple attracting fixed points, we study interventions aimed at eliminating or mitigating inequality, including increasing the capacity of the university to admit more people, aligning the signal generated by individuals with the preferences of the university, and making direct monetary transfers to the less wealthy population.
Publisher DOI
Wealth Dynamics Over Generations: Analysis and Interventions
arXiv (Cornell University) · 2022-09-15 · 1 citations
preprintOpen access
We present a stylized model with feedback loops for the evolution of a population's wealth over generations. Individuals have both talent and wealth: talent is a random variable distributed identically for everyone, but wealth is a random variable that is dependent on the population one is born into. Individuals then apply to a downstream agent, which we treat as a university throughout the paper (but could also represent an employer) who makes a decision about whether to admit them or not. The university does not directly observe talent or wealth, but rather a signal (representing e.g. a standardized test) that is a convex combination of both. The university knows the distributions from which an individual's type and wealth are drawn, and makes its decisions based on the posterior distribution of the applicant's characteristics conditional on their population and signal. Each population's wealth distribution at the next round then depends on the fraction of that population that was admitted by the university at the previous round. We study wealth dynamics in this model, and give conditions under which the dynamics have a single attracting fixed point (which implies population wealth inequality is transitory), and conditions under which it can have multiple attracting fixed points (which implies that population wealth inequality can be persistent). In the case in which there are multiple attracting fixed points, we study interventions aimed at eliminating or mitigating inequality, including increasing the capacity of the university to admit more people, aligning the signal generated by individuals with the preferences of the university, and making direct monetary transfers to the less wealthy population.
Publisher OA PDF DOI
Reconstructing Ultrametric Trees from Noisy Experiments
arXiv (Cornell University) · 2022-06-15
preprintOpen accessSenior author
The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are very close to each other in the tree, the result of the experiment should be close to uniformly random. Motivated by this, we consider a model where the noise on any triple is just dependent on the three pairwise distances (referred to as distance based noise). Our results are the following: 1. Suppose the length of every edge in the unknown tree is at least $\tilde{O}(\frac{1}{\sqrt n})$ fraction of the length of a root-leaf path. Then, we give an efficient algorithm to reconstruct the topology of the tree for a broad family of distance-based noise models. Further, we show that if the edges are asymptotically shorter, then topology reconstruction is information-theoretically impossible. 2. Further, for a specific distance-based noise model--which we refer to as the homogeneous noise model--we show that the edge weights can also be approximately reconstructed under the same quantitative lower bound on the edge lengths.
Publisher OA PDF DOI
Pipeline Interventions
Mathematics of Operations Research · 2022-05-16 · 1 citations
article
We introduce the pipeline intervention problem, defined by a layered directed acyclic graph and a set of stochastic matrices governing transitions between successive layers. The graph is a stylized model for how people from different populations are presented opportunities, eventually leading to some reward. In our model, individuals are born into an initial position (i.e., some node in the first layer of the graph) according to a fixed probability distribution and then stochastically progress through the graph according to the transition matrices until they reach a node in the final layer of the graph; each node in the final layer has a reward associated with it. The pipeline intervention problem asks how to best make costly changes to the transition matrices governing people’s stochastic transitions through the graph subject to a budget constraint. We consider two objectives: social welfare maximization and a fairness-motivated maximin objective that seeks to maximize the value to the population (starting node) with the least expected value. We consider two variants of the maximin objective that turn out to be distinct, depending on whether we demand a deterministic solution or allow randomization. For each objective, we give an efficient approximation algorithm (an additive fully polynomial-time approximation scheme) for constant-width networks. We also tightly characterize the “price of fairness” in our setting: the ratio between the highest achievable social welfare and the social welfare consistent with a maximin optimal solution. Finally, we show that, for polynomial-width networks, even approximating the maximin objective to any constant factor is NP hard even for networks with constant depth. This shows that the restriction on the width in our positive results is essential.
Publisher DOI
Best vs. All: Equity and Accuracy of Standardized Test Score Reporting
2022 ACM Conference on Fairness, Accountability, and Transparency · 2022-06-20 · 2 citations
article
We study a game theoretic model of standardized testing for college admissions. Students are of two types; High and Low. There is a college that would like to admit the High type students. Students take a potentially costly standardized exam which provides a noisy signal of their type.
Publisher DOI
Best vs. All: Equity and Accuracy of Standardized Test Score Reporting
arXiv (Cornell University) · 2021 · 2 citations
1st authorCorresponding
- Computer Science
- Political Science
- Artificial Intelligence
We study a game theoretic model of standardized testing for college admissions. Students are of two types; High and Low. There is a college that would like to admit the High type students. Students take a potentially costly standardized exam which provides a noisy signal of their type. The students come from two populations, which are identical in talent (i.e. the type distribution is the same), but differ in their access to resources: the higher resourced population can at their option take the exam multiple times, whereas the lower resourced population can only take the exam once. We study two models of score reporting, which capture existing policies used by colleges. The first policy (sometimes known as "super-scoring") allows students to report the max of the scores they achieve. The other policy requires that all scores be reported. We find in our model that requiring that all scores be reported results in superior outcomes in equilibrium, both from the perspective of the college (the admissions rule is more accurate), and from the perspective of equity across populations: a student's probability of admission is independent of their population, conditional on their type. In particular, the false positive rates and false negative rates are identical in this setting, across the highly and poorly resourced student populations. This is the case despite the fact that the more highly resourced students can -- at their option -- either report a more accurate signal of their type, or pool with the lower resourced population under this policy.
Publisher OA PDF