Michael L. Littman
· University Professor of Computer ScienceVerifiedBrown University · Computer Science
Active 1957–2025
About
Michael L. Littman is a University Professor of Computer Science and serves as the Associate Provost for Artificial Intelligence at Brown University. His primary research areas include Artificial Intelligence, Machine Learning, Reinforcement Learning, and Robotics, with secondary focus on Algorithmic Fairness. Dr. Littman's work involves advancing the understanding and development of intelligent systems, contributing to the fields of AI and machine learning through research and leadership. His role at Brown encompasses both academic and administrative responsibilities, supporting the university's initiatives in artificial intelligence and related disciplines.
Research topics
- Artificial Intelligence
- Computer Science
- Machine Learning
- Sociology
- Political Science
- Psychology
- Algorithm
- Management science
- Mathematical optimization
- Cognitive psychology
- Cognitive science
- Mathematical analysis
- Mathematics
- Human–computer interaction
- Library science
Selected publications
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
2025-01-01 · 2 citations
articleOpen accessMax Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael Littman, Stephen Bach. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.
Enabling End Users to Program Robots Using Reinforcement Learning
2025-03-04
articleReinforcement learning (RL) is a powerful learning technique in robotics, where people can specify rewards that robots learn how to maximize through a process of trialanderror. Despite the numerous advantages of RL to robot programming, no approaches to our knowledge have sought to enable nontechnical users to specify RL programs for robots. In this work, we designed two novel RL-based robot programming paradigms for non-technical users: Full MDP Programming (Full-MDP) and Goal-Only MDP Programming (Goal-MDP). To evaluate the efficacy of these two approaches, we ran a between-subjects online user study (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N$</tex> = 409) where participants were asked to program a simulated robot to complete example household tasks (e.g., delivering coffee) using one of our RL programming paradigms or a commonly used baseline: Sequential Programming (Seq), or Trigger-Action Programming (TAP). While users neither performed well nor reported positive experiences with the FullMDP interface, user performance and experience with Goal-MDP was similar to the baselines (Seq and TAP) with significantly shorter programs. These results demonstrate that RL-based paradigms like Goal-MDP are a viable alternative to more traditional approaches and provide a starting point for robot programming interfaces that allow end-users to leverage the myriad benefits of RL for programming robots.
Knowledge Retention for Continual Model-Based Reinforcement Learning
ArXiv.org · 2025-03-06
preprintOpen accessWe propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.
2025-04-24 · 2 citations
articleLarge language models (LLMs) like GPT-4 can convert natural-language descriptions of a task into computer code, making them a promising interface for end-user programming. We undertake a systematic analysis of how people with and without programming experience describe information-processing tasks (IPTs) in natural language, focusing on the characteristics of successful communication. Across two online between-subjects studies, we paired crowdworkers either with one another or with an LLM, asking senders (always humans) to communicate IPTs in natural language to their receiver (either a human or LLM). Both senders and receivers tried to answer test cases, the latter based on their sender’s description. While participants with programming experience tended to communicate IPTs more successfully than non-programmers, this advantage was not overwhelming. Furthermore, a user interface that solicited example test cases from senders often, but not always, improved IPT communication. Allowing receivers to request clarification, though, was less successful at improving communication.
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
arXiv (Cornell University) · 2024-07-03
preprintOpen accessRecent works have explored using language models for planning problems. One approach examines translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). Existing evaluation methods struggle to ensure semantic correctness and rely on simple or unrealistic datasets. To bridge this gap, we introduce \textit{Planetarium}, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks. \textit{Planetarium} features a novel PDDL equivalence algorithm that flexibly evaluates the correctness of generated PDDL, along with a dataset of 145,918 text-to-PDDL pairs across 73 unique state combinations with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task's complexity. For example, 96.1\% of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, 94.4\% are solvable, but only 24.8\% are semantically correct, highlighting the need for a more rigorous benchmark for this problem.
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
2024-01-01 · 2 citations
articleMitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
arXiv (Cornell University) · 2024-07-10
preprintOpen accessReinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to -- or knowledge of -- an underlying, unobservable state space. Our metric, the $λ$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($λ$) with a different value of $λ$. Since TD($λ{=}0$) makes an implicit Markov assumption and TD($λ{=}1$) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $λ$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $λ$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $λ$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.
A domain-agnostic approach for characterization of lifelong learning systems
Neural Networks · 2023-01-20 · 16 citations
articleOpen accessNSF on Chien's Grand Challenge for Sustainability
Communications of the ACM · 2023-04-21 · 3 citations
articleNo abstract available.
Social is special: A normative framework for teaching with and learning from evaluative feedback
2023-05-26 · 4 citations
preprintOpen accessHumans often attempt to influence one another’s behavior using rewards and punishments. How does this work? Psychologists have often assumed that “evaluative feedback” influences behavior via standard learning mechanisms that learn from environmental contingencies. On this view, teaching with evaluative feedback involves leveraging learning systems designed to maximize an organism’s positive outcomes. Yet, despite its parsimony, programs of research predicated on this assumption, such as ones in developmental psychology, animal behavior, and human-robot interaction, have had limited success. We offer an explanation by analyzing the logic of evaluative feedback and show that specialized learning mechanisms are uniquely favored in the case of evaluative feedback from a social partner. Specifically, evaluative feedback works best when it is treated as communicating information about the value of an action rather than as a form of reward to be maximized. This account suggests that human learning from evaluative feedback depends on inferences about communicative intent, goals and other mental states—much like learning from other sources, such as demonstration, observation and instruction. Because these abilities are especially developed in humans, the present account also explains why evaluative feedback is far more widespread in humans than non-human animals.
Recent grants
RI: Small: Understanding Value-based Multiagent Learning and Its Applications
NSF · $157k · 2013–2016
RI: Small: Collaborative Research: Speeding Up Learning through Modeling the Pragmatics of Training
NSF · $156k · 2013–2016
HSD-DRU: The Role of Communication in the Dynamics of Effective Decision Making
NSF · $685k · 2007–2011
RI: Collaborative Research: Feature Discovery and Benchmarks for Exportable Reinforcement Learning
NSF · $241k · 2007–2012
ITR: Collaborative Research: Representation and Learning in Computational Game theory
NSF · $375k · 2003–2009
Frequent coauthors
- 56 shared
James MacGlashan
- 41 shared
Mark K. Ho
New York University
- 38 shared
David Abel
- 35 shared
Kavosh Asadi
- 32 shared
Leslie Pack Kaelbling
- 29 shared
George Konidaris
John Brown University
- 27 shared
Dilip Arumugam
Stanford University
- 22 shared
Stefanie Tellex
John Brown University
Education
- 1991
Ph.D., Computer Science
Brown University
- 1987
M.S., Computer Science
Brown University
- 1985
B.S., Computer Science
Brown University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Michael L. Littman
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup