Abdullah Almaatouq

· Douglas Drane Career Development Associate Professor in Information Technology and ManagementVerified

Massachusetts Institute of Technology · Information Technology

Active 2013–2026

h-index13

Citations937

Papers5637 last 5y

Funding—

Faculty page Lab page

See your match with Abdullah Almaatouq — sign in to PhdFit.Sign in

About

Abdullah Almaatouq is the Douglas Drane Career Development Associate Professor in Information Technology and Management at the MIT Sloan School of Management. He is a computational social scientist whose research focuses on improving cooperation, coordination, and collective intelligence in decision-making systems such as teams, committees, crowds, markets, and elections. Abdullah explores ways to advance social and behavioral research methodology through innovative research designs and theory-building strategies, with the goal of developing a deeper understanding of collective decision systems and how to design them effectively in various contexts. He is affiliated with the MIT Center for Computational Engineering, the MIT Center for Collective Intelligence, and the MIT Connection Science Research Initiative. Abdullah holds a PhD in computational science and engineering, along with dual master's degrees in media arts and sciences (MIT Media Lab) and computational science and engineering from MIT. Prior to joining MIT, he earned his undergraduate degree from Southampton University in the United Kingdom.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Psychology
Social psychology
Statistics
Mathematics
Engineering
Medicine
Theoretical computer science
Cognitive psychology
Communication
Algorithm
Geography

Selected publications

Integrative experiments identify how punishment affects welfare in public goods games
Science · 2026-04-09 · 1 citations
articleSenior authorCorresponding
Despite decades of research, the conditions under which punishment promotes cooperation remain unclear. Through an integrative experiment varying 14 design parameters of public goods games across 360 experimental conditions (147,618 decisions from 7100 participants), we reveal substantial heterogeneity in punishment effectiveness: Its impact on welfare ranges from 43% improvement to 44% reduction depending on the game parameters. To characterize these patterns, we developed models that outperformed human forecasters in predicting punishment effectiveness in new experiments. Communication emerges as the most important factor, followed by contribution framing (opt out versus opt in), contribution type (variable versus all-or-nothing), game length, and outcome visibility, though these factors often interact. The results reframe the debate from whether punishment works to when it does, demonstrating how integrative experiments enable discovery of generalizable patterns in social phenomena.
Publisher DOI
Post-training makes large language models less human-like
arXiv (Cornell University) · 2026-05-08
preprintOpen access
Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.
Publisher DOI
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
arXiv (Cornell University) · 2026-03-06
preprintOpen access
Current frontier AI safety evaluations emphasize static benchmarks, third-party annotations, and red-teaming. In this position paper, we argue that AI safety research should focus on human-centered evaluations that measure harmful capability uplift: the marginal increase in a user's ability to cause harm with a frontier model beyond what conventional tools already enable. We frame harmful capability uplift as a core AI safety metric, ground it in prior social science research, and provide concrete methodological guidance for systematic measurement. We conclude with actionable steps for developers, researchers, funders, and regulators to make harmful capability uplift evaluation a standard practice.
Publisher DOI
Post-training makes large language models less human-like
ArXiv.org · 2026-05-08
articleOpen access
Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.
Publisher OA PDF
Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift
arXiv (Cornell University) · 2026-03-06
articleOpen access
Current frontier AI safety evaluations emphasize static benchmarks, third-party annotations, and red-teaming. In this position paper, we argue that AI safety research should focus on human-centered evaluations that measure harmful capability uplift: the marginal increase in a user's ability to cause harm with a frontier model beyond what conventional tools already enable. We frame harmful capability uplift as a core AI safety metric, ground it in prior social science research, and provide concrete methodological guidance for systematic measurement. We conclude with actionable steps for developers, researchers, funders, and regulators to make harmful capability uplift evaluation a standard practice.
Publisher OA PDF
The Integration of Explanation and Prediction in Behavioral Science
Current Directions in Psychological Science · 2026-04-10
article1st authorCorresponding
Behavioral scientists aim to explain and predict behavior. In principle, these goals align; in practice, common approaches to pursuing them have become distinct traditions in tension with one another. The explanatory tradition often examines causal factors in isolation, establishing that they have some effect but not how much or how they combine. The predictive tradition learns how factors combine, but these patterns may not reflect a causal structure or hold when conditions change. Answering how much each factor matters, and how they combine across settings, requires both predictive accuracy and causal interpretation. This article examines three developments toward this integration: evaluation frameworks that emphasize generalization, systematic experimentation and flexible models, and interpretation tools. We present recent empirical examples that demonstrate how this integration enables the discovery of generalizable patterns and provides a path toward cumulative behavioral science.
Publisher DOI
The Task Space: An Integrative Framework for Team Research
2025-12-01
preprintOpen accessSenior author
Research on teams spans many contexts, but integrating knowledge from heterogeneous sources is challenging because studies typically examine different tasks that cannot be directly compared. Most investigations involve teams working on just one or a handful of tasks, and researchers lack principled ways to quantify how similar or different these tasks are from one another. We address this challenge by introducing the “Task Space,” a multidimensional space in which tasks—and the distances between them—can be represented formally, and use it to create a “Task Map” of 102 crowd-annotated tasks from the published experimental literature. We then demonstrate the Task Space’s utility by performing an integrative experiment that addresses a fundamental question in team research: when do interacting groups outperform individuals? Our experiment samples 20 diverse tasks from the Task Map at three complexity levels and recruits 1,231 participants to work either individually or in groups of three or six (180 experimental conditions). We find striking heterogeneity in group advantage, with groups performing anywhere from three times worse to 60% better than the best individual working alone, depending on the task context. Critically, the Task Space makes this heterogeneity predictable: it significantly outperforms traditional typologies in predicting group advantage on unseen tasks. Our models also reveal theoretically meaningful interactions between task features; for example, group advantage on creative tasks depends on whether the answers are objectively verifiable. We conclude by arguing that the Task Space enables researchers to integrate findings across different experiments, thereby building cumulative knowledge about team performance.
Publisher DOI
Integrative Experiments Identify How Punishment Affects Welfare in Public Goods Games
Open MIND · 2025-01-01
otherSenior author
Reproducibility package for "Integrative Experiments Identify How Punishment Affects Welfare in Public Goods Games" (https://www.science.org/doi/10.1126/science.aeb5280)
Publisher
The Task Space: An Integrative Framework for Team Research
PsyArXiv (OSF Preprints) · 2025-10-14
otherOpen access
Research on teams spans many contexts, but integrating knowledge from heterogeneous sources is challenging because studies typically examine different tasks that cannot be directly compared. Most investigations involve teams working on just one or a handful of tasks, and researchers lack principled ways to quantify how similar or different these tasks are from one another. We address this challenge by introducing the “Task Space,” a multidimensional space in which tasks—and the distances between them—can be represented formally, and use it to create a “Task Map” of 102 crowd-annotated tasks from the published experimental literature. We then demonstrate the Task Space’s utility by performing an integrative experiment that addresses a fundamental question in team research: when do interacting groups outperform individuals? Our experiment samples 20 diverse tasks from the Task Map at three complexity levels and recruits 1,231 participants to work either individually or in groups of three or six (180 experimental conditions). We find striking heterogeneity in group advantage, with groups performing anywhere from three times worse to 60% better than the best individual working alone, depending on the task context. Critically, the Task Space makes this heterogeneity predictable: it significantly outperforms traditional typologies in predicting group advantage on unseen tasks. Our models also reveal theoretically meaningful interactions between task features; for example, group advantage on creative tasks depends on whether the answers are objectively verifiable. We conclude by arguing that the Task Space enables researchers to integrate findings across different experiments, thereby building cumulative knowledge about team performance.
Publisher
Studying collective intelligence in the lab
Edward Elgar Publishing Limited eBooks · 2025-12-11
book-chapter1st authorCorresponding
Publisher DOI

Frequent coauthors

P. M. Krafft
University of the Arts London
43 shared
Mehdi Moussaïd
Max Planck Institute for Human Development
31 shared
Alex Pentland
Massachusetts Institute of Technology
30 shared
Abdulrahman Alotaibi
Moscow Institute of Thermal Technology
30 shared
Alejandro Noriega-Campero
Moscow Institute of Thermal Technology
30 shared
Alex Pentland
Human Media
27 shared
Duncan J. Watts
University of Pennsylvania
15 shared
David G. Rand
Massachusetts Institute of Technology
14 shared

Labs

GenAI LabPI

Education

Masters of Science, Media Lab
Massachusetts Institute of Technology
Masters of Science, Center for Computational Engineering
Massachusetts Institute of Technology
Computational Science & Engineering, Computational Engineering
Massachusetts Institute of Technology
2019
Bachelor of Science, Electronics and Computer Science
University of Southampton
2012

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Abdullah Almaatouq

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you