Abhishek Gupta

· ProfessorVerified

University of Washington · Computer Science & Engineering

Active 2008–2026

h-index36

Citations6.9k

Papers19287 last 5y

Funding$400k

Faculty page Lab page

See your match with Abhishek Gupta — sign in to PhdFit.Sign in

About

Abhishek Gupta is an assistant professor in computer science and engineering at the Paul G. Allen School at the University of Washington, where he leads the Washington Embodied Intelligence and Robotics Development (WEIRD) Lab. His main research goal is to develop algorithms that enable robotic systems to learn how to perform complex tasks in a variety of unstructured environments like offices and homes. To achieve this, he works towards building deep reinforcement learning algorithms capable of learning in the real world, with and around humans. Previously, Gupta was a post-doctoral scholar at MIT, collaborating with Russ Tedrake and Pulkit Agarwal. He completed his Ph.D. in machine learning and robotics at Berkeley Artificial Intelligence Research (BAIR) at UC Berkeley, advised by Professor Sergey Levine and Professor Pieter Abbeel. He also completed his bachelor’s degree at UC Berkeley.

Research signals

Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.

Research topics

Artificial Intelligence
Computer Science
Social psychology
Simulation
Mathematics
Psychology
Engineering
Mathematics education
Automotive engineering

Selected publications

Spatially Correlated Blockage Aware Placement of RIS in IIoT Networks
IEEE Transactions on Wireless Communications · 2026-01-01
articleSenior author
Publisher DOI
Masquerade: Simple and Lightweight Transaction Reordering Mitigation in Blockchains
Distributed Ledger Technologies Research and Practice · 2025-04-21 · 1 citations
articleSenior author
Blockchains offer strong security guarantees, but cannot protect users against the ordering of transactions. Players such as miners, bots, and validators can reorder transactions to reap significant profits, called the maximal extractable value (MEV). In this article, we propose an MEV aware protocol design called Masquerade and show that it will increase user rewards in the system by using a strict per-transaction level of ordering to ensure that a transaction is committed either way even if it is revealed. In this protocol, we introduce the notion of a token to mitigate the actions taken by an adversary in an attack scenario. Such tokens can be purchased voluntarily by users, who can then choose to include the token numbers in their transactions. If the users include the token in their transactions, then our protocol requires the block-builder to order the transactions strictly according to token numbers. We show through extensive simulations that this reduces the probability that the adversaries can benefit from MEV transactions as compared to existing practices. We show that successful MEV attacks decrease by about 70% on average.
Publisher DOI
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
ArXiv.org · 2025-02-08 · 1 citations
preprintOpen access
Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is the lack of robotic data, which are typically obtained through expensive on-robot operation. A promising remedy is to leverage cheaper, off-domain data such as action-free videos, hand-drawn sketches or simulation data. In this work, we posit that hierarchical vision-language-action (VLA) models can be more effective in utilizing off-domain data than standard monolithic VLA models that directly finetune vision-language models (VLMs) to predict actions. In particular, we study a class of hierarchical VLA models, where the high-level VLM is finetuned to produce a coarse 2D path indicating the desired robot end-effector trajectory given an RGB image and a task description. The intermediate 2D path prediction is then served as guidance to the low-level, 3D-aware control policy capable of precise manipulation. Doing so alleviates the high-level VLM from fine-grained action prediction, while reducing the low-level policy's burden on complex task-level reasoning. We show that, with the hierarchical design, the high-level VLM can transfer across significant domain gaps between the off-domain finetuning data and real-robot testing scenarios, including differences on embodiments, dynamics, visual appearances and task semantics, etc. In the real-robot experiments, we observe an average of 20% improvement in success rate across seven different axes of generalization over OpenVLA, representing a 50% relative gain. Visual results, code, and dataset are provided at: https://hamster-robot.github.io/
Publisher OA PDF DOI
Rapidly Adapting Policies to the Real World via Simulation-Guided Fine-Tuning
ArXiv.org · 2025-02-04
preprintOpen accessSenior author
Robot learning requires a considerable amount of high-quality data to realize the promise of generalization. However, large data sets are costly to collect in the real world. Physics simulators can cheaply generate vast data sets with broad coverage over states, actions, and environments. However, physics engines are fundamentally misspecified approximations to reality. This makes direct zero-shot transfer from simulation to reality challenging, especially in tasks where precise and force-sensitive manipulation is necessary. Thus, fine-tuning these policies with small real-world data sets is an appealing pathway for scaling robot learning. However, current reinforcement learning fine-tuning frameworks leverage general, unstructured exploration strategies which are too inefficient to make real-world adaptation practical. This paper introduces the Simulation-Guided Fine-tuning (SGFT) framework, which demonstrates how to extract structural priors from physics simulators to substantially accelerate real-world adaptation. Specifically, our approach uses a value function learned in simulation to guide real-world exploration. We demonstrate this approach across five real-world dexterous manipulation tasks where zero-shot sim-to-real transfer fails. We further demonstrate our framework substantially outperforms baseline fine-tuning methods, requiring up to an order of magnitude fewer real-world samples and succeeding at difficult tasks where prior approaches fail entirely. Last but not least, we provide theoretical justification for this new paradigm which underpins how SGFT can rapidly learn high-performance policies in the face of large sim-to-real dynamics gaps. Project webpage: https://weirdlabuw.github.io/sgft/{weirdlabuw.github.io/sgft}
Publisher OA PDF DOI
Multimodal Emotion Recognition Using Facial and Audio Features
Lecture notes in networks and systems · 2025-01-01 · 2 citations
book-chapter
Publisher DOI
Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning
2025-01-01
articleOpen access
Publisher OA PDF DOI
AMPS: ASR with Multimodal Paraphrase Supervision
2025-01-01
articleOpen access1st authorCorresponding
Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, Preethi Jyothi. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2025.
Publisher OA PDF DOI
Steering Your Diffusion Policy with Latent Space Reinforcement Learning
ArXiv.org · 2025-06-18
preprintOpen access
Robotic control policies learned from human demonstrations have achieved impressive results in many real-world applications. However, in scenarios where initial performance is not satisfactory, as is often the case in novel open-world settings, such behavioral cloning (BC)-learned policies typically require collecting additional human demonstrations to further improve their behavior -- an expensive and time-consuming process. In contrast, reinforcement learning (RL) holds the promise of enabling autonomous online policy improvement, but often falls short of achieving this due to the large number of samples it typically requires. In this work we take steps towards enabling fast autonomous adaptation of BC-trained policies via efficient real-world RL. Focusing in particular on diffusion policies -- a state-of-the-art BC methodology -- we propose diffusion steering via reinforcement learning (DSRL): adapting the BC policy by running RL over its latent-noise space. We show that DSRL is highly sample efficient, requires only black-box access to the BC policy, and enables effective real-world autonomous policy improvement. Furthermore, DSRL avoids many of the challenges associated with finetuning diffusion policies, obviating the need to modify the weights of the base policy at all. We demonstrate DSRL on simulated benchmarks, real-world robotic tasks, and for adapting pretrained generalist policies, illustrating its sample efficiency and effective performance at real-world policy improvement.
Publisher OA PDF DOI
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
ArXiv.org · 2025-03-06
preprintOpen access
Enabling robots to learn novel tasks in a data-efficient manner is a long-standing challenge. Common strategies involve carefully leveraging prior experiences, especially transition data collected on related tasks. Although much progress has been made for general pick-and-place manipulation, far fewer studies have investigated contact-rich assembly tasks, where precise control is essential. We introduce SRSA (Skill Retrieval and Skill Adaptation), a novel framework designed to address this problem by utilizing a pre-existing skill library containing policies for diverse assembly tasks. The challenge lies in identifying which skill from the library is most relevant for fine-tuning on a new task. Our key hypothesis is that skills showing higher zero-shot success rates on a new task are better suited for rapid and effective fine-tuning on that task. To this end, we propose to predict the transfer success for all skills in the skill library on a novel task, and then use this prediction to guide the skill retrieval process. We establish a framework that jointly captures features of object geometry, physical dynamics, and expert actions to represent the tasks, allowing us to efficiently learn the transfer success predictor. Extensive experiments demonstrate that SRSA significantly outperforms the leading baseline. When retrieving and fine-tuning skills on unseen tasks, SRSA achieves a 19% relative improvement in success rate, exhibits 2.6x lower standard deviation across random seeds, and requires 2.4x fewer transition samples to reach a satisfactory success rate, compared to the baseline. Furthermore, policies trained with SRSA in simulation achieve a 90% mean success rate when deployed in the real world. Please visit our project webpage https://srsa2024.github.io/.
Publisher OA PDF DOI
VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation
ArXiv.org · 2025-10-23
preprintOpen accessSenior author
A fundamental challenge in robot navigation lies in learning policies that generalize across diverse environments while conforming to the unique physical constraints and capabilities of a specific embodiment (e.g., quadrupeds can walk up stairs, but rovers cannot). We propose VAMOS, a hierarchical VLA that decouples semantic planning from embodiment grounding: a generalist planner learns from diverse, open-world data, while a specialist affordance model learns the robot's physical constraints and capabilities in safe, low-cost simulation. We enabled this separation by carefully designing an interface that lets a high-level planner propose candidate paths directly in image space that the affordance model then evaluates and re-ranks. Our real-world experiments show that VAMOS achieves higher success rates in both indoor and complex outdoor navigation than state-of-the-art model-based and end-to-end learning methods. We also show that our hierarchical design enables cross-embodied navigation across legged and wheeled robots and is easily steerable using natural language. Real-world ablations confirm that the specialist model is key to embodiment grounding, enabling a single high-level planner to be deployed across physically distinct wheeled and legged robots. Finally, this model significantly enhances single-robot reliability, achieving 3X higher success rates by rejecting physically infeasible plans. Website: https://vamos-vla.github.io/
Publisher OA PDF DOI

Recent grants

Collaborative Research: Smarter Markets for a Smarter Grid: Pricing Randomness, Flexibility and Risk
NSF · $225k · 2016–2020
CRII: CPS SaTC: Securing Smart Cyberphysical Systems against Man-in-the-Middle Attacks
NSF · $175k · 2016–2019

Frequent coauthors

Sergey Levine
37 shared
Pieter Abbeel
University of California, Berkeley
22 shared
Tamer Başar
21 shared
Laxmikant V. Kalé
University of Illinois Urbana-Champaign
19 shared
Cédric Langbort
15 shared
Vikash Kumar
14 shared
Jayanth Reddy Regatti
The Ohio State University
10 shared
Ness B. Shroff
10 shared

Education

B.S.
UC Berkeley
Ph.D., machine learning and robotics
UC Berkeley

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Abhishek Gupta

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you