Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Michael L. Littman

Michael L. Littman

· University Professor of Computer ScienceVerified

Brown University · Computer Science

Active 1957–2025

h-index83
Citations45.4k
Papers45484 last 5y
Funding$3.6M
See your match with Michael L. Littman — sign in to PhdFit.Sign in

About

Michael L. Littman is a University Professor of Computer Science and serves as the Associate Provost for Artificial Intelligence at Brown University. His primary research areas include Artificial Intelligence, Machine Learning, Reinforcement Learning, and Robotics, with secondary focus on Algorithmic Fairness. Dr. Littman's work involves advancing the understanding and development of intelligent systems, contributing to the fields of AI and machine learning through research and leadership. His role at Brown encompasses both academic and administrative responsibilities, supporting the university's initiatives in artificial intelligence and related disciplines.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Machine Learning
  • Sociology
  • Political Science
  • Psychology
  • Algorithm
  • Management science
  • Mathematical optimization
  • Cognitive psychology
  • Cognitive science
  • Mathematical analysis
  • Mathematics
  • Human–computer interaction
  • Library science

Selected publications

  • Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

    2025-01-01 · 2 citations

    articleOpen access

    Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael Littman, Stephen Bach. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025.

  • Enabling End Users to Program Robots Using Reinforcement Learning

    2025-03-04

    article

    Reinforcement learning (RL) is a powerful learning technique in robotics, where people can specify rewards that robots learn how to maximize through a process of trialanderror. Despite the numerous advantages of RL to robot programming, no approaches to our knowledge have sought to enable nontechnical users to specify RL programs for robots. In this work, we designed two novel RL-based robot programming paradigms for non-technical users: Full MDP Programming (Full-MDP) and Goal-Only MDP Programming (Goal-MDP). To evaluate the efficacy of these two approaches, we ran a between-subjects online user study (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N$</tex> = 409) where participants were asked to program a simulated robot to complete example household tasks (e.g., delivering coffee) using one of our RL programming paradigms or a commonly used baseline: Sequential Programming (Seq), or Trigger-Action Programming (TAP). While users neither performed well nor reported positive experiences with the FullMDP interface, user performance and experience with Goal-MDP was similar to the baselines (Seq and TAP) with significantly shorter programs. These results demonstrate that RL-based paradigms like Goal-MDP are a viable alternative to more traditional approaches and provide a starting point for robot programming interfaces that allow end-users to leverage the myriad benefits of RL for programming robots.

  • Knowledge Retention for Continual Model-Based Reinforcement Learning

    ArXiv.org · 2025-03-06

    preprintOpen access

    We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.

  • How Humans Communicate Programming Tasks in Natural Language and Implications For End-User Programming with LLMs

    2025-04-24 · 2 citations

    article

    Large language models (LLMs) like GPT-4 can convert natural-language descriptions of a task into computer code, making them a promising interface for end-user programming. We undertake a systematic analysis of how people with and without programming experience describe information-processing tasks (IPTs) in natural language, focusing on the characteristics of successful communication. Across two online between-subjects studies, we paired crowdworkers either with one another or with an LLM, asking senders (always humans) to communicate IPTs in natural language to their receiver (either a human or LLM). Both senders and receivers tried to answer test cases, the latter based on their sender’s description. While participants with programming experience tended to communicate IPTs more successfully than non-programmers, this advantage was not overwhelming. Furthermore, a user interface that solicited example test cases from senders often, but not always, improved IPT communication. Allowing receivers to request clarification, though, was less successful at improving communication.

  • Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages

    arXiv (Cornell University) · 2024-07-03

    preprintOpen access

    Recent works have explored using language models for planning problems. One approach examines translating natural language descriptions of planning tasks into structured planning languages, such as the planning domain definition language (PDDL). Existing evaluation methods struggle to ensure semantic correctness and rely on simple or unrealistic datasets. To bridge this gap, we introduce \textit{Planetarium}, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks. \textit{Planetarium} features a novel PDDL equivalence algorithm that flexibly evaluates the correctness of generated PDDL, along with a dataset of 145,918 text-to-PDDL pairs across 73 unique state combinations with varying levels of difficulty. Finally, we evaluate several API-access and open-weight language models that reveal this task's complexity. For example, 96.1\% of the PDDL problem descriptions generated by GPT-4o are syntactically parseable, 94.4\% are solvable, but only 24.8\% are semantically correct, highlighting the need for a more rigorous benchmark for this problem.

  • Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

    2024-01-01 · 2 citations

    article
  • Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

    arXiv (Cornell University) · 2024-07-10

    preprintOpen access

    Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to -- or knowledge of -- an underlying, unobservable state space. Our metric, the $λ$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($λ$) with a different value of $λ$. Since TD($λ{=}0$) makes an implicit Markov assumption and TD($λ{=}1$) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $λ$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $λ$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $λ$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.

  • A domain-agnostic approach for characterization of lifelong learning systems

    Neural Networks · 2023-01-20 · 16 citations

    articleOpen access
  • NSF on Chien's Grand Challenge for Sustainability

    Communications of the ACM · 2023-04-21 · 3 citations

    article

    No abstract available.

  • Social is special: A normative framework for teaching with and learning from evaluative feedback

    2023-05-26 · 4 citations

    preprintOpen access

    Humans often attempt to influence one another’s behavior using rewards and punishments. How does this work? Psychologists have often assumed that “evaluative feedback” influences behavior via standard learning mechanisms that learn from environmental contingencies. On this view, teaching with evaluative feedback involves leveraging learning systems designed to maximize an organism’s positive outcomes. Yet, despite its parsimony, programs of research predicated on this assumption, such as ones in developmental psychology, animal behavior, and human-robot interaction, have had limited success. We offer an explanation by analyzing the logic of evaluative feedback and show that specialized learning mechanisms are uniquely favored in the case of evaluative feedback from a social partner. Specifically, evaluative feedback works best when it is treated as communicating information about the value of an action rather than as a form of reward to be maximized. This account suggests that human learning from evaluative feedback depends on inferences about communicative intent, goals and other mental states—much like learning from other sources, such as demonstration, observation and instruction. Because these abilities are especially developed in humans, the present account also explains why evaluative feedback is far more widespread in humans than non-human animals.

Recent grants

Frequent coauthors

Education

  • Ph.D., Computer Science

    Brown University

    1991
  • M.S., Computer Science

    Brown University

    1987
  • B.S., Computer Science

    Brown University

    1985
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Michael L. Littman

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup