
Animesh Garg
VerifiedGeorgia Institute of Technology · Computer Science
Active 2011–2025
About
Animesh Garg is a Stephen Fleming Early Career Professor at the School of Interactive Computing at Georgia Tech. He leads the People, AI, and Robotics (PAIR) research group and is on the core faculty in the Robotics and Machine Learning programs. He is also a Senior Researcher at Nvidia Research. Animesh earned a Ph.D. from UC Berkeley and completed a postdoctoral fellowship at the Stanford AI Lab. He is on leave from the Department of Computer Science at the University of Toronto and holds a CIFAR Chair position at the Vector Institute. His work aims to build Generalizable Autonomy, involving a confluence of representations and algorithms for reinforcement learning, control, and perception. His current research focuses on learning structured inductive biases in sequential decision making, using data-driven causal discovery, and transferring these capabilities to real robots, all within the scope of embodied systems.
Research topics
- Computer science
- Artificial intelligence
- Machine learning
- Human–computer interaction
- Computer vision
Selected publications
CLIMB: Language-Guided Continual Learning for Task Planning with Iterative Model Building
2025-05-19
articleSenior authorIntelligent and reliable task planning is a core capability for generalized robotics, which requires a descriptive domain representation that sufficiently models all object and state information for the scene. We present CLIMB, a continual learning framework for robot task planning that leverages foundation models and feedback from execution to guide the construction of domain models. CLIMB can build a model from a natural language description, learn non-obvious predicates while solving tasks, and store that information for future problems. We demonstrate the ability of CLIMB to improve performance in common planning environments compared to baseline methods. We also developed the BlocksWorld++ domain, a simulated environment with an easily usable real counterpart, together with a curriculum of tasks with progressing difficulty to evaluate continual learning. Code and additional details for this system can be found at https://plan-with-climb.github.io/.
2025-05-19 · 2 citations
articleSenior authorBehavior cloning facilitates the learning of dexterous manipulation skills, yet the complexity of surgical environments, the difficulty and expense of obtaining patient data, and robot calibration errors present unique challenges for surgical robot learning. We provide an enhanced surgical digital twin with photorealistic human anatomical organs, integrated into a comprehensive simulator designed to generate high-quality synthetic data to solve fundamental tasks in surgical autonomy. We present SuFIA-BC: visual Behavior Cloning policies for Surgical First Interactive Autonomy Assistants. We investigate visual observation spaces including multi-view cameras and 3D visual representations extracted from a single endoscopic camera view. Through systematic evaluation, we find that the diverse set of photorealistic surgical tasks introduced in this work enables a comprehensive evaluation of prospective behavior cloning models for the unique challenges posed by surgical environments. We observe that current state-of-the-art behavior cloning techniques struggle to solve the contact-rich and complex tasks evaluated in this work, regardless of their underlying perception or control architectures. These findings highlight the importance of customizing perception pipelines and control architectures, as well as curating larger-scale synthetic datasets that meet the specific demands of surgical tasks. Project website: orbit-surgical.github.io/sufia-bc/
Science Robotics · 2025-09-24 · 10 citations
articleScience laboratory automation enables accelerated discovery in life sciences and materials. However, it requires interdisciplinary collaboration to address challenges such as robust and flexible autonomy, reproducibility, throughput, standardization, the role of human scientists, and ethics. This article highlights these issues, reflecting perspectives from leading experts in laboratory automation across different disciplines of the natural sciences.
OG-VLA: Orthographic Image Generation for 3D-Aware Vision-Language Action Model
arXiv (Cornell University) · 2025-06-01
preprintOpen accessWe introduce OG-VLA, a novel architecture and learning framework that combines the generalization strengths of Vision Language Action models (VLAs) with the robustness of 3D-aware policies. We address the challenge of mapping natural language instructions and one or more RGBD observations to quasi-static robot actions. 3D-aware robot policies achieve state-of-the-art performance on precise robot manipulation tasks, but struggle with generalization to unseen instructions, scenes, and objects. On the other hand, VLAs excel at generalizing across instructions and scenes, but can be sensitive to camera and robot pose variations. We leverage prior knowledge embedded in language and vision foundation models to improve generalization of 3D-aware keyframe policies. OG-VLA unprojects input observations from diverse views into a point cloud which is then rendered from canonical orthographic views, ensuring input view invariance and consistency between input and output spaces. These canonical views are processed with a vision backbone, a Large Language Model (LLM), and an image diffusion model to generate images that encode the next position and orientation of the end-effector on the input scene. Evaluations on the Arnold and Colosseum benchmarks demonstrate state-of-the-art generalization to unseen environments, with over 40% relative improvements while maintaining robust performance in seen settings. We also show real-world adaption in 3 to 5 demonstrations along with strong generalization. Videos and resources at https://og-vla.github.io/
MATTERIX: toward a digital twin for robotics-assisted chemistry laboratory automation
Nature Computational Science · 2025-12-31 · 1 citations
articleOpen accessSenior authorScaling Laws in Scientific Discovery with AI and Robot Scientists
ArXiv.org · 2025-03-28
preprintOpen accessScientific discovery is poised for rapid advancement through advanced robotics and artificial intelligence. Current scientific practices face substantial limitations as manual experimentation remains time-consuming and resource-intensive, while multidisciplinary research demands knowledge integration beyond individual researchers' expertise boundaries. Here, we envision an autonomous generalist scientist (AGS) concept combines agentic AI and embodied robotics to automate the entire research lifecycle. This system could dynamically interact with both physical and virtual environments while facilitating the integration of knowledge across diverse scientific disciplines. By deploying these technologies throughout every research stage -- spanning literature review, hypothesis generation, experimentation, and manuscript writing -- and incorporating internal reflection alongside external feedback, this system aims to significantly reduce the time and resources needed for scientific discovery. Building on the evolution from virtual AI scientists to versatile generalist AI-based robot scientists, AGS promises groundbreaking potential. As these autonomous systems become increasingly integrated into the research process, we hypothesize that scientific discovery might adhere to new scaling laws, potentially shaped by the number and capabilities of these autonomous systems, offering novel perspectives on how knowledge is generated and evolves. The adaptability of embodied robots to extreme environments, paired with the flywheel effect of accumulating scientific knowledge, holds the promise of continually pushing beyond both physical and intellectual frontiers.
“Data will solve robotics and automation: True or false?”: A debate
Science Robotics · 2025-08-27
reviewLeading researchers debate the long-term influence of model-free methods that use large sets of demonstration data to train numerical generative models to control robots.
AnyPlace: Learning Generalized Object Placement for Robot Manipulation
ArXiv.org · 2025-02-06 · 4 citations
preprintOpen accessSenior authorObject placement in robotic tasks is inherently challenging due to the diversity of object geometries and placement configurations. To address this, we propose AnyPlace, a two-stage method trained entirely on synthetic data, capable of predicting a wide range of feasible placement poses for real-world tasks. Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently. For training, we generate a fully synthetic dataset of randomly generated objects in different placement configurations (insertion, stacking, hanging) and train local placement-prediction models. We conduct extensive evaluations in simulation, demonstrating that our method outperforms baselines in terms of success rate, coverage of possible placement modes, and precision. In real-world experiments, we show how our approach directly transfers models trained purely on synthetic data to the real world, where it successfully performs placements in scenarios where other models struggle -- such as with varying object geometries, diverse placement modes, and achieving high precision for fine placement. More at: https://any-place.github.io.
ACGD: Visual Multitask Policy Learning with Asymmetric Critic Guided Distillation
2025-10-19
articleSenior authorWe present Asymmetric Critic Guided Distillation, ACGD, a framework for learning multi-task dexterous manipulation policies that can manipulate articulated objects using images as input. ACGD is a scalable student-teacher distillation approach that utilizes behavior cloning to distill multiple expert policies into a single vision-based, multi-task student policy for dexterous manipulation. The expert policies are trained with traditional RL techniques with access to privileged state information of both the robot and the manipulated object, while the distilled student policy operates under realistic sensory constraints, specifically using only camera images and robot proprioception. During distillation, we use an expert-critic that provides action labels and value estimates to refine the student’s action sampling through a dual IL/RL objective. In the multi-task setting, we achieve this through an aggregate critic for different single-task experts. Our approach exhibits strong performance compared to a number of state-of-the-art imitation learning (IL) and reinforcement learning (RL) baselines. We evaluate across a variety of multi-task dexterous manipulation benchmarks including bimanual manipulation, single-hand object articulation tasks, and a tendon-actuated hand and achieves state-of-the-art performance with 10-15% improvement over the baseline algorithms. Visit our website for more details.
Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025
ArXiv.org · 2025-04-13
preprintOpen accessPeer review at AI conferences is stressed by rapidly rising submission volumes, leading to deteriorating review quality and increased author dissatisfaction. To address these issues, we developed Review Feedback Agent, a system leveraging multiple large language models (LLMs) to improve review clarity and actionability by providing automated feedback on vague comments, content misunderstandings, and unprofessional remarks to reviewers. Implemented at ICLR 2025 as a large randomized control study, our system provided optional feedback to more than 20,000 randomly selected reviews. To ensure high-quality feedback for reviewers at this scale, we also developed a suite of automated reliability tests powered by LLMs that acted as guardrails to ensure feedback quality, with feedback only being sent to reviewers if it passed all the tests. The results show that 27% of reviewers who received feedback updated their reviews, and over 12,000 feedback suggestions from the agent were incorporated by those reviewers. This suggests that many reviewers found the AI-generated feedback sufficiently helpful to merit updating their reviews. Incorporating AI feedback led to significantly longer reviews (an average increase of 80 words among those who updated after receiving feedback) and more informative reviews, as evaluated by blinded researchers. Moreover, reviewers who were selected to receive AI feedback were also more engaged during paper rebuttals, as seen in longer author-reviewer discussions. This work demonstrates that carefully designed LLM-generated review feedback can enhance peer review quality by making reviews more specific and actionable while increasing engagement between reviewers and authors. The Review Feedback Agent is publicly available at https://github.com/zou-group/review_feedback_agent.
Frequent coauthors
- 61 shared
Silvio Savarese
- 50 shared
Dieter Fox
- 48 shared
Yuke Zhu
The University of Texas at Austin
- 35 shared
Li Fei-Fei
- 31 shared
Arthur Allshire
- 31 shared
Samarth Sinha
University of Toronto
- 31 shared
Florian Shkurti
University of Toronto
- 29 shared
Ken Goldberg
University of California, Berkeley
Labs
People, AI, and Robotics (PAIR) research groupPI
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Animesh Garg
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup