Jon Kleinberg

Verified

Cornell University · Computer Science

Active 1956–2025

h-index122

Citations90.8k

Papers655144 last 5y

Funding$3.7M

Faculty page Lab page

See your match with Jon Kleinberg — sign in to PhdFit.Sign in

About

Jon Kleinberg is the Tisch University Professor at Cornell University, affiliated with both the Department of Computer Science and the Department of Information Science. His research centers on algorithms and networks, particularly their roles in large-scale social and information systems and the broader societal implications of these technologies. Kleinberg's work has received significant recognition and support, including prestigious awards such as the NSF Career Award, ONR Young Investigator Award, MacArthur Foundation Fellowship, Packard Foundation Fellowship, Simons Investigator Award, Sloan Foundation Fellowship, and Vannevar Bush Faculty Fellowship. His research has also been funded by major technology companies and foundations including Facebook, Google, Yahoo, the MacArthur and Simons Foundations, as well as government agencies like AFOSR, ARO, and NSF. He is a distinguished member of several elite organizations, including the National Academy of Sciences, the National Academy of Engineering, the American Academy of Arts and Sciences, and the American Philosophical Society. Kleinberg has contributed to education through teaching and authoring books, including "Networks, Crowds, and Markets: Reasoning About a Highly Connected World" and "Algorithm Design," which are used in undergraduate and graduate courses. Since Spring 2021, he has co-taught an introductory course on the ethical, societal, and policy implications of computing and information. His academic influence extends through advising numerous PhD students and postdoctoral researchers, reflecting his active role in mentoring the next generation of scholars in his fields of expertise.

Research topics

Computer Science
Artificial Intelligence
Data science
Machine Learning
Natural Language Processing
Data Mining
Risk analysis (engineering)
Management science
Mathematics
Psychology
Business
Engineering
Econometrics
Algorithm
Mathematical economics

Selected publications

Tracking patterns in toxicity and antisocial behavior over user lifetimes on large social media platforms
Scientific Reports · 2025-07-14 · 1 citations
articleOpen accessSenior author
An increasing amount of attention has been devoted to the problem of "toxic" or antisocial behavior on social media. In this paper we analyze such behavior at very large scales: over a 14-year time span on nearly 500 million comments from Reddit and Wikipedia, grounded in two different proxies for toxicity. At the individual level, we analyze users' toxicity levels over the course of their time on the site, and find a striking reversal in trends: both Reddit and Wikipedia users tended to become less toxic over their life cycles on the site in the early (pre-2013) history of the site, but more toxic over their life cycles in the later (post-2013) history of the site. We also find that toxicity on Reddit and Wikipedia differ in a key way, with the most toxic behavior on Reddit exhibited in aggregate by the most active users, and the most toxic behavior on Wikipedia exhibited in aggregate by the least active users. Finally, we consider the toxicity of discussion around widely-shared pieces of content, and find that the trends for toxicity in discussion about content bear interesting similarities with the trends for toxicity in discussion by users.
Publisher OA PDF DOI
The Backfiring Effect of Weak AI Safety Regulation
ArXiv.org · 2025-03-26
preprintOpen access
Recent policy proposals aim to improve the safety of general-purpose AI, but there is little understanding of the efficacy of different regulatory approaches to AI safety. We present a strategic model that explores the interactions between safety regulation, the general-purpose AI creators, and domain specialists--those who adapt the technology for specific applications. Our analysis examines how different regulatory measures, targeting different parts of the AI development chain, affect the outcome of this game. In particular, we assume AI technology is characterized by two key attributes: safety and performance. The regulator first sets a minimum safety standard that applies to one or both players, with strict penalties for non-compliance. The general-purpose creator then invests in the technology, establishing its initial safety and performance levels. Next, domain specialists refine the AI for their specific use cases, updating the safety and performance levels and taking the product to market. The resulting revenue is then distributed between the specialist and generalist through a revenue-sharing parameter. Our analysis reveals two key insights: First, weak safety regulation imposed predominantly on domain specialists can backfire. While it might seem logical to regulate AI use cases, our analysis shows that weak regulations targeting domain specialists alone can unintentionally reduce safety. This effect persists across a wide range of settings. Second, in sharp contrast to the previous finding, we observe that stronger, well-placed regulation can in fact mutually benefit all players subjected to it. When regulators impose appropriate safety standards on both general-purpose AI creators and domain specialists, the regulation functions as a commitment device, leading to safety and performance gains, surpassing what is achieved under no regulation or regulating one player alone.
Publisher OA PDF DOI
Using Large Language Models to Promote Health Equity
NEJM AI · 2025-01-13 · 16 citations
article
Publisher DOI
Replicating Electoral Success
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11
articleOpen accessSenior author
A core tension in the study of plurality elections is the clash between the classic Hotelling-Downs model, which predicts that two office-seeking candidates should cater to the median voter, and the empirical observation that democracies often have two major parties with divergent policies. Motivated in part by this tension, we introduce a dynamic model of candidate positioning based on a simple bounded rationality heuristic: candidates imitate the policy of previous winners. The resulting model is closely connected to evolutionary replicator dynamics. For uniformly-distributed voters, we prove in our model that with k = 2, 3, or 4 candidates per election, any symmetric candidate distribution converges over time to the center. With k ≥ 5 candidates per election, however, we prove that the candidate distribution does not converge to the center and provide an even stronger non-convergence result in a special case with no extreme candidates. Our conclusions are qualitatively unchanged if a small fraction of candidates are not winner-copiers and are instead positioned uniformly at random in each election. Beyond our theoretical analysis, we illustrate our results in extensive simulations; for five or more candidates, we find a tendency towards the emergence of two clusters, a mechanism suggestive of Duverger's Law, the empirical finding that plurality leads to two-party systems. Our simulations also explore several variations of the model, where we find the same general pattern: convergence to the center with four or fewer candidates, but not with five or more. Finally, we discuss the relationship between our replicator dynamics model and prior work on strategic equilibria of candidate positioning games.
Publisher OA PDF DOI
The Backfiring Effect of Weak AI Safety Regulation
2025-10-29
article
Publisher DOI
Designing Algorithmic Delegates: The Role of Indistinguishability in Human-AI Handoff
ArXiv.org · 2025-06-03
preprintOpen accessSenior author
As AI technologies improve, people are increasingly willing to delegate tasks to AI agents. In many cases, the human decision-maker chooses whether to delegate to an AI agent based on properties of the specific instance of the decision-making problem they are facing. Since humans typically lack full awareness of all the factors relevant to this choice for a given decision-making instance, they perform a kind of categorization by treating indistinguishable instances -- those that have the same observable features -- as the same. In this paper, we define the problem of designing the optimal algorithmic delegate in the presence of categories. This is an important dimension in the design of algorithms to work with humans, since we show that the optimal delegate can be an arbitrarily better teammate than the optimal standalone algorithmic agent. The solution to this optimal delegation problem is not obvious: we discover that this problem is fundamentally combinatorial, and illustrate the complex relationship between the optimal design and the properties of the decision-making task even in simple settings. Indeed, we show that finding the optimal delegate is computationally hard in general. However, we are able to find efficient algorithms for producing the optimal delegate in several broad cases of the problem, including when the optimal action may be decomposed into functions of features observed by the human and the algorithm. Finally, we run computational experiments to simulate a designer updating an algorithmic delegate over time to be optimized for when it is actually adopted by users, and show that while this process does not recover the optimal delegate in general, the resulting delegate often performs quite well.
Publisher OA PDF DOI
Strategic Defense Allocation in Cyber-Physical Sensor Systems Under Dual-Domain Attacks
Lecture notes in computer science · 2025-10-12
book-chapterSenior author
Publisher DOI
Density Measures for Language Generation
2025-12-14
article1st authorCorresponding
The recent successes of large language models (LLMs) have led to a surge of theoretical research into the properties of language generation. A recent line of work has proposed an abstract view of the question — called language generation in the limit — in which we view language generation as a game played between an adversary and an algorithm: the adversary generates strings from an unknown language K, known only to come from a countable collection of candidate languages, and after observing a finite set of these strings, the algorithm must generate new strings from the language K that it hasn't seen before. This formalism highlights an important tension: the trade-off between validity (that the algorithm should only produce strings that come from the language) and breadth (that the algorithm should be able to produce "many" strings from the language). This validity-breadth trade-off is a central issue in applied work on language generation as well, where it arises in the balance between hallucination, when models generate invalid utterances, and mode collapse, when models only generate from a very restricted set of feasible outputs. Despite its importance, this trade-off has been challenging to study quantitatively.In this work we develop ways of quantifying this trade-off, by formalizing the notion of breadth through measures of density. Roughly speaking, the density of one language L in another language L' is the limiting fraction of strings from L among the strings of L', where we take the limit over longer and longer finite prefixes of L'. Existing algorithms for language generation in the limit produce output sets that can have zero density in the true language K, in this asymptotic sense, and this represents an important failure of breadth that might seem necessary in any solution to the problem. We show here that such a failure is not in fact necessary: we provide an algorithm for language generation in the limit whose outputs have strictly positive density in the true language K. We also study the internal representations built by algorithms for this problem — the sequence of hypothesized candidate languages they iterate through as they perform generation — showing a precise sense in which the strongest form of breadth achievable is one that may need to "oscillate" indefinitely between hypothesized representations of high density and low density. Our analysis introduces a novel topology on language families, with notions of convergence and limit points in this topology playing a key role in the analysis.
Publisher DOI
Private Blotto: Viewpoint Competition with Polarized Agents
Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11
articleOpen accessSenior author
Social media platforms are responsible for collecting and disseminating vast quantities of content. Recently, however, they have also begun enlisting users in helping annotate this content - for example, to provide context or label disinformation. However, users may act strategically, sometimes reflecting biases (e.g. political) about the "right" label. How can social media platforms design their systems to use human time most efficiently? Historically, competition over multiple items has been explored in the Colonel Blotto game setting. However, they were originally designed to model two centrally-controlled armies competing over zero-sum "items", a specific scenario with limited modern-day application. In this work, we propose and study Private Blotto game, a variant with the key difference that individual agents act independently, without being coordinated by a central "Colonel". We completely characterize the Nash stability of this game and how this impacts the amount of "misallocated effort" of users on unimportant items. We show that the outcome function (aggregating multiple labels on a single item) has a critical impact, and specifically contrast a majority rule outcome (the median) as compared to a smoother outcome function (mean). In general, for median outcomes we show that instances without stable arrangements only occur for relatively few numbers of agents, but stable arrangements may have very high levels of misallocated effort. For mean outcome functions, we show that unstable arrangements can occur even for arbitrarily large numbers of agents, but when stable arrangements exist, they always have low misallocated effort. We conclude by discussing implications our results have for motivating examples in social media platforms and political competition.
Publisher OA PDF DOI
The Power of Choice in Random Sampling
SSRN Electronic Journal · 2025-01-01
preprintOpen accessSenior author
Publisher DOI

Recent grants

HCC: Large: Collaborative Research: Design Principles for Information Networks Supporting the Social Production of Knowledge
NSF · $2.6M · 2009–2015
III: Small: Collaborative Research: Mining Information Propagation on the Web
NSF · $80k · 2010–2013
BIGDATA: IA: Harnessing Language and Interaction Dynamics at Multiple Scales to Maximize the Benefits of Group Interaction
NSF · $1.0M · 2017–2020

Frequent coauthors

Sendhil Mullainathan
University of Chicago
77 shared
David Hutchison
Lancaster University
66 shared
Doug Tygar
University of California, Berkeley
66 shared
Friedemann Mattern
66 shared
Bernhard Steffen
TU Dortmund University
66 shared
Gerhard Weikum
66 shared
Moni Naor
65 shared
Oscar Nierstrasz
65 shared

Labs

Jon Kleinberg's LabPI
Research on algorithms and networks, their roles in large-scale social and information systems, and broader societal implications.

Education

Ph.D., Computer Science
Cornell University
1996
B.S., Computer Science
Princeton University
1991

Awards & honors

NSF Career Award
ONR Young Investigator Award
MacArthur Foundation Fellowship
Packard Foundation Fellowship
Sloan Foundation Fellowship

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jon Kleinberg

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you