Abbeel

· Assistant Professor

University of California, Berkeley · Electrical Engineering and Computer Sciences

Active 2002–2025

h-index156

Citations110.5k

Papers890345 last 5y

Funding—

Faculty page Lab page

See your match with Abbeel — sign in to PhdFit.Sign in

About

Professor Pieter Abbeel is the Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab at UC Berkeley. His research focuses on building increasingly intelligent systems, pushing the frontiers of deep reinforcement learning, deep imitation learning, deep unsupervised learning, transfer learning, meta-learning, and learning to learn. He also studies the influence of AI on society and investigates how AI can advance other science and engineering disciplines. Abbeel's work includes developing foundational AI classes that have been taken by over 100,000 students through edX, and his materials on Deep Reinforcement Learning and Deep Unsupervised Learning are considered standard references for AI researchers. He has founded three companies—Gradescope, Covariant, and Berkeley Open Arms—and advises numerous AI and robotics startups. Recognized with multiple awards including the PECASE, NSF-CAREER, and DARPA-YFA, his work is frequently featured in major press outlets. His educational background includes a Ph.D. in Computer Science from Stanford University and an M.S. in Electrical Engineering from KU Leuven, Belgium.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Mathematics
Engineering
Algorithm
Programming language
Applied mathematics
Electrical engineering
Computer graphics (images)
Physics
Human–computer interaction
Statistical physics
Control engineering
Statistics
Mathematical analysis

Selected publications

Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
2025-06-10
article
Efficient tokenization of videos remains a challenge in training vision models that can process long videos. One promising direction is to develop a tokenizer that can encode long video clips, as it would enable the tokenizer to leverage the temporal coherence of videos better for tokenization. However, training existing tokenizers on long videos often incurs a huge training cost as they are trained to reconstruct all the frames at once. In this paper, we introduce CoordTok, a video tokenizer that learns a mapping from coordinate-based representations to the corresponding patches of input videos, inspired by recent advances in 3D generative models. In particular, CoordTok encodes a video into factorized triplane representations and reconstructs patches that correspond to randomly sampled (x, y, t) coordinates. This allows for training large tokenizer models directly on long videos without requiring excessive training resources. Our experiments show that CoordTok can drastically reduce the number of tokens for encoding long video clips. For instance, CoordTok can encode a 128-frame video with 128 × 128 resolution into 1280 tokens, while baselines need 6144 or 8192 tokens to achieve similar reconstruction quality. We further show that this efficient video tokenization enables memory-efficient training of a diffusion transformer that can generate 128 frames at once.
Publisher DOI
SkillBlender: Towards Versatile Humanoid Whole-Body Loco-Manipulation via Skill Blending
ArXiv.org · 2025-06-11
preprintOpen access
Humanoid robots hold significant potential in accomplishing daily tasks across diverse environments thanks to their flexibility and human-like morphology. Recent works have made significant progress in humanoid whole-body control and loco-manipulation leveraging optimal control or reinforcement learning. However, these methods require tedious task-specific tuning for each task to achieve satisfactory behaviors, limiting their versatility and scalability to diverse tasks in daily scenarios. To that end, we introduce SkillBlender, a novel hierarchical reinforcement learning framework for versatile humanoid loco-manipulation. SkillBlender first pretrains goal-conditioned task-agnostic primitive skills, and then dynamically blends these skills to accomplish complex loco-manipulation tasks with minimal task-specific reward engineering. We also introduce SkillBench, a parallel, cross-embodiment, and diverse simulated benchmark containing three embodiments, four primitive skills, and eight challenging loco-manipulation tasks, accompanied by a set of scientific evaluation metrics balancing accuracy and feasibility. Extensive simulated experiments show that our method significantly outperforms all baselines, while naturally regularizing behaviors to avoid reward hacking, resulting in more accurate and feasible movements for diverse loco-manipulation tasks in our daily scenarios. Our code and benchmark will be open-sourced to the community to facilitate future research. Project page: https://usc-gvl.github.io/SkillBlender-web/.
Publisher OA PDF DOI
Rodrigues Network for Learning Robot Actions
ArXiv.org · 2025-06-03
preprintOpen access
Understanding and predicting articulated actions is important in robot learning. However, common architectures such as MLPs and Transformers lack inductive biases that reflect the underlying kinematic structure of articulated systems. To this end, we propose the Neural Rodrigues Operator, a learnable generalization of the classical forward kinematics operation, designed to inject kinematics-aware inductive bias into neural computation. Building on this operator, we design the Rodrigues Network (RodriNet), a novel neural architecture specialized for processing actions. We evaluate the expressivity of our network on two synthetic tasks on kinematic and motion prediction, showing significant improvements compared to standard backbones. We further demonstrate its effectiveness in two realistic applications: (i) imitation learning on robotic benchmarks with the Diffusion Policy, and (ii) single-image 3D hand reconstruction. Our results suggest that integrating structured kinematic priors into the network architecture improves action learning in various domains.
Publisher OA PDF DOI
SEMDICE: Off-policy State Entropy Maximization via Stationary Distribution Correction Estimation
ArXiv.org · 2025-12-10
preprintOpen accessSenior author
In the unsupervised pre-training for reinforcement learning, the agent aims to learn a prior policy for downstream tasks without relying on task-specific reward functions. We focus on state entropy maximization (SEM), where the goal is to learn a policy that maximizes the entropy of the state stationary distribution. In this paper, we introduce SEMDICE, a principled off-policy algorithm that computes an SEM policy from an arbitrary off-policy dataset, which optimizes the policy directly within the space of stationary distributions. SEMDICE computes a single, stationary Markov state-entropy-maximizing policy from an arbitrary off-policy dataset. Experimental results demonstrate that SEMDICE outperforms baseline algorithms in maximizing state entropy while achieving the best adaptation efficiency for downstream tasks among SEM-based unsupervised RL pre-training methods.
Publisher OA PDF DOI
GaussGym: An open-source real-to-sim framework for learning locomotion from pixels
ArXiv.org · 2025-10-17
preprintOpen accessSenior author
We present a novel approach for photorealistic robot simulation that integrates 3D Gaussian Splatting as a drop-in renderer within vectorized physics simulators such as IsaacGym. This enables unprecedented speed -- exceeding 100,000 steps per second on consumer GPUs -- while maintaining high visual fidelity, which we showcase across diverse tasks. We additionally demonstrate its applicability in a sim-to-real robotics setting. Beyond depth-based sensing, our results highlight how rich visual semantics improve navigation and decision-making, such as avoiding undesirable regions. We further showcase the ease of incorporating thousands of environments from iPhone scans, large-scale scene datasets (e.g., GrandTour, ARKit), and outputs from generative video models like Veo, enabling rapid creation of realistic training worlds. This work bridges high-throughput simulation and high-fidelity perception, advancing scalable and generalizable robot learning. All code and data will be open-sourced for the community to build upon. Videos, code, and data available at https://escontrela.me/gauss_gym/.
Publisher OA PDF DOI
End-to-end RL Improves Dexterous Grasping Policies
ArXiv.org · 2025-09-19
preprintOpen access
This work explores techniques to scale up image-based end-to-end learning for dexterous grasping with an arm + hand system. Unlike state-based RL, vision-based RL is much more memory inefficient, resulting in relatively low batch sizes, which is not amenable for algorithms like PPO. Nevertheless, it is still an attractive method as unlike the more commonly used techniques which distill state-based policies into vision networks, end-to-end RL can allow for emergent active vision behaviors. We identify a key bottleneck in training these policies is the way most existing simulators scale to multiple GPUs using traditional data parallelism techniques. We propose a new method where we disaggregate the simulator and RL (both training and experience buffers) onto separate GPUs. On a node with four GPUs, we have the simulator running on three of them, and PPO running on the fourth. We are able to show that with the same number of GPUs, we can double the number of existing environments compared to the previous baseline of standard data parallelism. This allows us to train vision-based environments, end-to-end with depth, which were previously performing far worse with the baseline. We train and distill both depth and state-based policies into stereo RGB networks and show that depth distillation leads to better results, both in simulation and reality. This improvement is likely due to the observability gap between state and vision policies which does not exist when distilling depth policies into stereo RGB. We further show that the increased batch size brought about by disaggregated simulation also improves real world performance. When deploying in the real world, we improve upon the previous state-of-the-art vision-based results using our end-to-end policies.
Publisher OA PDF DOI
Compute-Optimal Scaling for Value-Based Deep RL
ArXiv.org · 2025-08-20
preprintOpen access
As models grow larger and training them becomes expensive, it becomes increasingly important to scale training recipes not just to larger models and more data, but to do so in a compute-optimal manner that extracts maximal performance per unit of compute. While such scaling has been well studied for language modeling, reinforcement learning (RL) has received less attention in this regard. In this paper, we investigate compute scaling for online, value-based deep RL. These methods present two primary axes for compute allocation: model capacity and the update-to-data (UTD) ratio. Given a fixed compute budget, we ask: how should resources be partitioned across these axes to maximize sample efficiency? Our analysis reveals a nuanced interplay between model size, batch size, and UTD. In particular, we identify a phenomenon we call TD-overfitting: increasing the batch quickly harms Q-function accuracy for small models, but this effect is absent in large models, enabling effective use of large batch size at scale. We provide a mental model for understanding this phenomenon and build guidelines for choosing batch size and UTD to optimize compute usage. Our findings provide a grounded starting point for compute-optimal scaling in deep RL, mirroring studies in supervised learning but adapted to TD learning.
Publisher OA PDF DOI
Demonstrating MuJoCo Playground
2025-06-21 · 3 citations
articleOpen accessSenior author
We introduce MuJoCo Playground, a fully opensource framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and simto-real transfer onto robots.With a simple pip install playground, researchers can train policies in minutes on a single GPU.Playground supports diverse robotic platforms, including quadrupeds, humanoids, dexterous hands, and robotic arms, enabling zero-shot sim-to-real transfer from both state and pixel inputs.This is achieved through an integrated stack comprising a physics engine, batch renderer, and training environments.Along with video results, the entire framework is freely available at playground.mujoco.org.
Publisher DOI
Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields
ArXiv.org · 2025-11-10
preprintOpen accessSenior author
Despite years of research, real-time diverse grasp synthesis for dexterous hands remains an unsolved core challenge in robotics and computer graphics. We present Lightning Grasp, a novel high-performance procedural grasp synthesis algorithm that achieves orders-of-magnitude speedups over state-of-the-art approaches, while enabling unsupervised grasp generation for irregular, tool-like objects. The method avoids many limitations of prior approaches, such as the need for carefully tuned energy functions and sensitive initialization. This breakthrough is driven by a key insight: decoupling complex geometric computation from the search process via a simple, efficient data structure - the Contact Field. This abstraction collapses the problem complexity, enabling a procedural search at unprecedented speeds. We open-source our system to propel further innovation in robotic manipulation.
Publisher OA PDF DOI
Learning to Design Soft Hands using Reward Models
ArXiv.org · 2025-10-20
preprintOpen access
Soft robotic hands promise to provide compliant and safe interaction with objects and environments. However, designing soft hands to be both compliant and functional across diverse use cases remains challenging. Although co-design of hardware and control better couples morphology to behavior, the resulting search space is high-dimensional, and even simulation-based evaluation is computationally expensive. In this paper, we propose a Cross-Entropy Method with Reward Model (CEM-RM) framework that efficiently optimizes tendon-driven soft robotic hands based on teleoperation control policy, reducing design evaluations by more than half compared to pure optimization while learning a distribution of optimized hand designs from pre-collected teleoperation data. We derive a design space for a soft robotic hand composed of flexural soft fingers and implement parallelized training in simulation. The optimized hands are then 3D-printed and deployed in the real world using both teleoperation data and real-time teleoperation. Experiments in both simulation and hardware demonstrate that our optimized design significantly outperforms baseline hands in grasping success rates across a diverse set of challenging objects.
Publisher OA PDF DOI

Frequent coauthors

Sergey Levine
207 shared
Takayuki Osa
The University of Tokyo
66 shared
J. Andrew Bagnell
65 shared
Joni Pajarinen
65 shared
Gerhard Neumann
65 shared
Jan Peters
Technical University of Darmstadt
65 shared
Aviv Tamar
Technion – Israel Institute of Technology
56 shared
Kimin Lee
55 shared

Education

Ph.D., Electrical Engineering and Computer Sciences
University of California, Berkeley
2008
M.S., Electrical Engineering and Computer Sciences
University of California, Berkeley
2003
B.S., Electrical Engineering and Computer Sciences
University of California, Berkeley
2001

Awards & honors

PECASE
NSF-CAREER
ONR-YIP
Darpa-YFA
TR35

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Abbeel

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you