
Mac Schwager
· Associate Professor of Aeronautics and Astronautics and, byVerifiedStanford University · Aeronautics and Astronautics
Active 2005–2025
About
Mac Schwager is an Associate Professor of Aeronautics and Astronautics at Stanford University, with a courtesy appointment in Computer Science. His research focuses on autonomous systems, controls, and their applications in aerospace and transportation. As a faculty member at Stanford's Department of Aeronautics and Astronautics, he contributes to advancing the understanding and development of intelligent systems that can operate independently in complex environments. His work is integral to the department's efforts in autonomous systems and controls, supporting innovations in future aircraft design, space exploration, and transportation technologies.
Research topics
- Computer Science
- Artificial Intelligence
- Mathematical optimization
- Mathematics
- Algorithm
- Mathematical economics
- Mathematical analysis
- Control engineering
- Distributed computing
- Physics
- Engineering
- Real-time computing
Selected publications
ArXiv.org · 2025-03-06
preprintOpen accessSenior authorAutonomous visual navigation is an essential element in robot autonomy. Reinforcement learning (RL) offers a promising policy training paradigm. However existing RL methods suffer from high sample complexity, poor sim-to-real transfer, and limited runtime adaptability to navigation scenarios not seen during training. These problems are particularly challenging for drones, with complex nonlinear and unstable dynamics, and strong dynamic coupling between control and perception. In this paper, we propose a novel framework that integrates 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning (DDRL) to train vision-based drone navigation policies. By leveraging high-fidelity 3D scene representations and differentiable simulation, our method improves sample efficiency and sim-to-real transfer. Additionally, we incorporate a Context-aided Estimator Network (CENet) to adapt to environmental variations at runtime. Moreover, by curriculum training in a mixture of different surrounding environments, we achieve in-task generalization, the ability to solve new instances of a task not seen during training. Drone hardware experiments demonstrate our method's high training efficiency compared to state-of-the-art RL methods, zero shot sim-to-real transfer for real robot deployment without fine tuning, and ability to adapt to new instances within the same task class (e.g. to fly through a gate at different locations with different distractors in the environment). Our simulator and training framework are open-sourced at: https://github.com/Qianzhong-Chen/grad_nav.
Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives
ArXiv.org · 2025-05-09
preprintOpen accessSenior authorDiffusion policies have demonstrated remarkable dexterity and robustness in intricate, high-dimensional robot manipulation tasks, while training from a small number of demonstrations. However, the reason for this performance remains a mystery. In this paper, we offer a surprising hypothesis: diffusion policies essentially memorize an action lookup table -- and this is beneficial. We posit that, at runtime, diffusion policies find the closest training image to the test image in a latent space, and recall the associated training action sequence, offering reactivity without the need for action generalization. This is effective in the sparse data regime, where there is not enough data density for the model to learn action generalization. We support this claim with systematic empirical evidence. Even when conditioned on wildly out of distribution (OOD) images of cats and dogs, the Diffusion Policy still outputs an action sequence from the training data. With this insight, we propose a simple policy, the Action Lookup Table (ALT), as a lightweight alternative to the Diffusion Policy. Our ALT policy uses a contrastive image encoder as a hash function to index the closest corresponding training action sequence, explicitly performing the computation that the Diffusion Policy implicitly learns. We show empirically that for relatively small datasets, ALT matches the performance of a diffusion model, while requiring only 0.0034 of the inference time and 0.0085 of the memory footprint, allowing for much faster closed-loop inference with resource constrained robots. We also train our ALT policy to give an explicit OOD flag when the distance between the runtime image is too far in the latent space from the training images, giving a simple but effective runtime monitor. More information can be found at: https://stanfordmsl.github.io/alt/.
Latent Theory of Mind: A Decentralized Diffusion Architecture for Cooperative Manipulation
ArXiv.org · 2025-05-14
preprintOpen accessWe present Latent Theory of Mind (LatentToM), a decentralized diffusion policy architecture for collaborative robot manipulation. Our policy allows multiple manipulators with their own perception and computation to collaborate with each other towards a common task goal with or without explicit communication. Our key innovation lies in allowing each agent to maintain two latent representations: an ego embedding specific to the robot, and a consensus embedding trained to be common to both robots, despite their different sensor streams and poses. We further let each robot train a decoder to infer the other robot's ego embedding from their consensus embedding, akin to theory of mind in latent space. Training occurs centrally, with all the policies' consensus encoders supervised by a loss inspired by sheaf theory, a mathematical theory for clustering data on a topological manifold. Specifically, we introduce a first-order cohomology loss to enforce sheaf-consistent alignment of the consensus embeddings. To preserve the expressiveness of the consensus embedding, we further propose structural constraints based on theory of mind and a directional consensus mechanism. Execution can be fully distributed, requiring no explicit communication between policies. In which case, the information is exchanged implicitly through each robot's sensor stream by observing the actions of the other robots and their effects on the scene. Alternatively, execution can leverage direct communication to share the robots' consensus embeddings, where the embeddings are shared once during each inference step and are aligned using the sheaf Laplacian. In our hardware experiments, LatentToM outperforms a naive decentralized diffusion baseline, and shows comparable performance with a state-of-the-art centralized diffusion policy for bi-manual manipulation. Project website: https://stanfordmsl.github.io/LatentToM/.
Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones
Lecture notes in computer science · 2025-01-01 · 1 citations
book-chapterSOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum
IEEE Robotics and Automation Letters · 2025-03-22 · 3 citations
articleSenior authorWe propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only onboard perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100 k–300 k image/state-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level thrust and body rate commands at 20 Hz onboard a drone. Crucially, SV-Net includes a learned module for low-level control that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field.
Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps
IEEE Transactions on Robotics · 2025-01-01 · 22 citations
articleSenior authorWe present Splat-Nav, a real-time robot navigation pipeline for Gaussian splatting (GSplat) scenes, a powerful new 3-D scene representation. Splat-Nav consists of two components: first, Splat-Plan, a safe planning module, and second, Splat-Loc, a robust vision-based pose estimation module. Splat-Plan builds a safe-by-construction polytope corridor through the map based on mathematically rigorous collision constraints and then constructs a Bézier curve trajectory through this corridor. Splat-Loc provides real-time recursive state estimates given only an RGB feed from an on-board camera, leveraging the point-cloud representation inherent in GSplat scenes. Working together, these modules give robots the ability to recursively replan smooth and safe trajectories to goal locations. Goals can be specified with position coordinates, or with language commands by using a semantic GSplat. We demonstrate improved safety compared to point cloud-based methods in extensive simulation experiments. In a total of 126 hardware flights, we demonstrate equivalent safety and speed compared to motion capture and visual odometry, but without a manual frame alignment required by those methods. We show online replanning at more than 2 Hz and pose estimation at about 25 Hz, an order of magnitude faster than neural radiance field-based navigation methods, thereby enabling real-time navigation.
A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps
2025-05-19 · 5 citations
articleSenior authorSAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive safety filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at https://chengine.github.io/safer-splat.
HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting
IEEE Robotics and Automation Letters · 2025-05-30 · 5 citations
articleSenior author3D Gaussian Splatting offers expressive scene reconstruction and can model a broad range of visual, geometric, and semantic information. However, efficient real-time map reconstruction with data streamed from multiple robots and devices remains a challenge. To that end, we propose HAMMER, a server-based multi-robot Gaussian Splatting method that leverages ROS communication infrastructure to generate 3D, metric-semantic maps from asynchronous robot data-streams. HAMMER consists of (i) a one-time frame alignment module that transforms local SLAM poses and image data into a global frame and requires no prior relative pose knowledge, and (ii) an online module for continually training semantic 3DGS maps from streaming data. HAMMER handles mixed perception modes, adjusts automatically for variations in image pre-processing among different devices, and distills CLIP semantic codes into the 3D scene for language queries. In real-world experiments, HAMMER creates better maps compared to baselines and is useful for downstream tasks, such as semantic navigation (e.g., “go to the couch”). Accompanying content at hammer-project.github.io.
SINGER: An Onboard Generalist Vision-Language Navigation Policy for Drones
ArXiv.org · 2025-09-23
preprintOpen accessSenior authorLarge vision-language models have driven remarkable progress in open-vocabulary robot policies, e.g., generalist robot manipulation policies, that enable robots to complete complex tasks specified in natural language. Despite these successes, open-vocabulary autonomous drone navigation remains an unsolved challenge due to the scarcity of large-scale demonstrations, real-time control demands of drones for stabilization, and lack of reliable external pose estimation modules. In this work, we present SINGER for language-guided autonomous drone navigation in the open world using only onboard sensing and compute. To train robust, open-vocabulary navigation policies, SINGER leverages three central components: (i) a photorealistic language-embedded flight simulator with minimal sim-to-real gap using Gaussian Splatting for efficient data generation, (ii) an RRT-inspired multi-trajectory generation expert for collision-free navigation demonstrations, and these are used to train (iii) a lightweight end-to-end visuomotor policy for real-time closed-loop control. Through extensive hardware flight experiments, we demonstrate superior zero-shot sim-to-real transfer of our policy to unseen environments and unseen language-conditioned goal objects. When trained on ~700k-1M observation action pairs of language conditioned visuomotor data and deployed on hardware, SINGER outperforms a velocity-controlled semantic guidance baseline by reaching the query 23.33% more on average, and maintains the query in the field of view 16.67% more on average, with 10% fewer collisions.
Learning Robot Safety from Sparse Human Feedback using Conformal Prediction
arXiv (Cornell University) · 2025-01-08
preprintOpen accessSenior authorEnsuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.
Recent grants
CAREER: Controlling Ecologically Destructive Processes with a Network of Intelligent Robotic Agents
NSF · $317k · 2016–2019
NRI: FND: COLLAB: Distributed Semantically-Aware Tracking and Planning for Fleets of Robots
NSF · $468k · 2018–2022
CAREER: Controlling Ecologically Destructive Processes with a Network of Intelligent Robotic Agents
NSF · $286k · 2014–2016
NSF · $237k · 2013–2016
NSF · $375k · 2016–2019
Frequent coauthors
- 75 shared
Daniela Rus
- 63 shared
Eduardo Montijano
Universidad de Zaragoza
- 61 shared
Zijian Wang
Shenyang Ligong University
- 52 shared
Eric Cristofalo
- 36 shared
Riccardo Spica
Vaughn College of Aeronautics and Technology
- 33 shared
Ola Shorinwa
- 32 shared
Davide Scaramuzza
- 30 shared
Haruki Nishimura
Education
- 2005
Ph.D., Aeronautics and Astronautics
Stanford University
- 2001
M.S., Aeronautics and Astronautics
Stanford University
- 1998
B.S., Aeronautics and Astronautics
California Institute of Technology
Awards & honors
- AIAA: Excellence in Teaching Award
- AIAA: Outstanding Course Assistant
- William F. Ballhaus Prize
- Cannon Summer Fellowship
- Hoff Outstanding Master’s Student
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Mac Schwager
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup