Michael L. Littman

· University Professor of Computer ScienceVerified

Brown University · Computer Science

Active 1957–2025

h-index83

Citations45.4k

Papers45484 last 5y

Funding$3.6M

Faculty page Lab page

See your match with Michael L. Littman — sign in to PhdFit.Sign in

About

Michael L. Littman is a University Professor of Computer Science and serves as the Associate Provost for Artificial Intelligence at Brown University. His primary research areas include Artificial Intelligence, Machine Learning, Reinforcement Learning, and Robotics, with secondary focus on Algorithmic Fairness. Dr. Littman's work involves advancing the understanding and development of intelligent systems, contributing to the fields of AI and machine learning through research and leadership. His role at Brown encompasses both academic and administrative responsibilities, supporting the university's initiatives in artificial intelligence and related disciplines.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Sociology
Political Science
Psychology
Algorithm
Management science
Mathematical optimization
Cognitive psychology
Cognitive science
Mathematical analysis
Mathematics
Human–computer interaction
Library science

Selected publications

Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
2025-01-01 · 2 citations
articleOpen access
Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael Littman, Stephen Bach. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
Publisher OA PDF DOI
Enabling End Users to Program Robots Using Reinforcement Learning
2025-03-04
article
Reinforcement learning (RL) is a powerful learning technique in robotics, where people can specify rewards that robots learn how to maximize through a process of trialanderror. Despite the numerous advantages of RL to robot programming, no approaches to our knowledge have sought to enable nontechnical users to specify RL programs for robots. In this work, we designed two novel RL-based robot programming paradigms for non-technical users: Full MDP Programming (Full-MDP) and Goal-Only MDP Programming (Goal-MDP). To evaluate the efficacy of these two approaches, we ran a between-subjects online user study (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N$</tex> = 409) where participants were asked to program a simulated robot to complete example household tasks (e.g., delivering coffee) using one of our RL programming paradigms or a commonly used baseline: Sequential Programming (Seq), or Trigger-Action Programming (TAP). While users neither performed well nor reported positive experiences with the FullMDP interface, user performance and experience with Goal-MDP was similar to the baselines (Seq and TAP) with significantly shorter programs. These results demonstrate that RL-based paradigms like Goal-MDP are a viable alternative to more traditional approaches and provide a starting point for robot programming interfaces that allow end-users to leverage the myriad benefits of RL for programming robots.
Publisher DOI
Knowledge Retention for Continual Model-Based Reinforcement Learning
ArXiv.org · 2025-03-06
preprintOpen access
We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.
Publisher OA PDF DOI
How Humans Communicate Programming Tasks in Natural Language and Implications For End-User Programming with LLMs
2025-04-24 · 2 citations
article
Large language models (LLMs) like GPT-4 can convert natural-language descriptions of a task into computer code, making them a promising interface for end-user programming. We undertake a systematic analysis of how people with and without programming experience describe information-processing tasks (IPTs) in natural language, focusing on the characteristics of successful communication. Across two online between-subjects studies, we paired crowdworkers either with one another or with an LLM, asking senders (always humans) to communicate IPTs in natural language to their receiver (either a human or LLM). Both senders and receivers tried to answer test cases, the latter based on their sender’s description. While participants with programming experience tended to communicate IPTs more successfully than non-programmers, this advantage was not overwhelming. Furthermore, a user interface that solicited example test cases from senders often, but not always, improved IPT communication. Allowing receivers to request clarification, though, was less successful at improving communication.
Publisher DOI
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
arXiv (Cornell University) · 2024-07-03
preprintOpen access
Recent works have explored using language models for planning problems. One approach examines translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). Existing evaluation methods struggle to ensure semantic correctness and rely on simple or unrealistic datasets. To bridge this gap, we introduce \textit{Planetarium}, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks. \textit{Planetarium} features a novel PDDL equivalence algorithm that flexibly evaluates the correctness of generated PDDL, along with a dataset of 145,918 text-to-PDDL pairs across 73 unique state combinations with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task's complexity. For example, 96.1\% of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, 94.4\% are solvable, but only 24.8\% are semantically correct, highlighting the need for a more rigorous benchmark for this problem.
Publisher OA PDF DOI
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
2024-01-01 · 2 citations
article
Publisher DOI
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
arXiv (Cornell University) · 2024-07-10
preprintOpen access
Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to -- or knowledge of -- an underlying, unobservable state space. Our metric, the $λ$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($λ$) with a different value of $λ$. Since TD($λ{=}0$) makes an implicit Markov assumption and TD($λ{=}1$) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $λ$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $λ$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $λ$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.
Publisher OA PDF DOI
A domain-agnostic approach for characterization of lifelong learning systems
Neural Networks · 2023-01-20 · 16 citations
articleOpen access
Publisher OA PDF DOI
NSF on Chien's Grand Challenge for Sustainability
Communications of the ACM · 2023-04-21 · 3 citations
article
No abstract available.
Publisher DOI
Social is special: A normative framework for teaching with and learning from evaluative feedback
2023-05-26 · 4 citations
preprintOpen access
Humans often attempt to influence one another’s behavior using rewards and punishments. How does this work? Psychologists have often assumed that “evaluative feedback” influences behavior via standard learning mechanisms that learn from environmental contingencies. On this view, teaching with evaluative feedback involves leveraging learning systems designed to maximize an organism’s positive outcomes. Yet, despite its parsimony, programs of research predicated on this assumption, such as ones in developmental psychology, animal behavior, and human-robot interaction, have had limited success. We offer an explanation by analyzing the logic of evaluative feedback and show that specialized learning mechanisms are uniquely favored in the case of evaluative feedback from a social partner. Specifically, evaluative feedback works best when it is treated as communicating information about the value of an action rather than as a form of reward to be maximized. This account suggests that human learning from evaluative feedback depends on inferences about communicative intent, goals and other mental states—much like learning from other sources, such as demonstration, observation and instruction. Because these abilities are especially developed in humans, the present account also explains why evaluative feedback is far more widespread in humans than non-human animals.
Publisher OA PDF DOI

Recent grants

RI: Small: Understanding Value-based Multiagent Learning and Its Applications
NSF · $157k · 2013–2016
RI: Small: Collaborative Research: Speeding Up Learning through Modeling the Pragmatics of Training
NSF · $156k · 2013–2016
HSD-DRU: The Role of Communication in the Dynamics of Effective Decision Making
NSF · $685k · 2007–2011
RI: Collaborative Research: Feature Discovery and Benchmarks for Exportable Reinforcement Learning
NSF · $241k · 2007–2012
ITR: Collaborative Research: Representation and Learning in Computational Game theory
NSF · $375k · 2003–2009

Frequent coauthors

James MacGlashan
56 shared
Mark K. Ho
New York University
41 shared
David Abel
38 shared
Kavosh Asadi
35 shared
Leslie Pack Kaelbling
32 shared
George Konidaris
John Brown University
29 shared
Dilip Arumugam
Stanford University
27 shared
Stefanie Tellex
John Brown University
22 shared

Education

Ph.D., Computer Science
Brown University
1991
M.S., Computer Science
Brown University
1987
B.S., Computer Science
Brown University
1985

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Michael L. Littman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you