Josiah Hanna

· Assistant ProfessorVerified

University of Wisconsin-Madison · Computer Sciences

Active 2013–2026

h-index18

Citations1.2k

Papers9158 last 5y

Funding—

Faculty page Lab page

See your match with Josiah Hanna — sign in to PhdFit.Sign in

About

Josiah Hanna is an Assistant Professor at UW--Madison with a focus on reinforcement learning, robotics, and artificial intelligence. His research involves developing algorithms and models for autonomous robotics, with particular attention to topics such as policy gradient reinforcement learning, off-policy evaluation, and the integration of model-based approaches with deep learning. His work aims to advance the understanding and application of reinforcement learning techniques in real-world scenarios, including robotics and simulation-to-real transfer. He supervises a diverse group of students, including current PhD, MS, and undergraduate students, and has a history of mentoring alumni who have gone on to positions at leading technology companies and research institutions. His teaching portfolio includes courses on autonomous robotics, machine learning, and topics in reinforcement learning, reflecting his expertise and ongoing contributions to the field.

Research topics

Computer Science
Artificial Intelligence
Statistics
Mathematics
Econometrics

Selected publications

Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization
arXiv (Cornell University) · 2026-03-19
preprintOpen access
In this paper, we present a hardware-control co-design approach that enables efficient and versatile roller skating on quadrupedal robots equipped with passive wheels. Passive-wheel skating reduces leg inertia and improves energy efficiency, particularly at high speeds. However, the absence of direct wheel actuation tightly couples mechanical design and control. To unlock the full potential of this modality, we formulate a bilevel optimization framework: an upper-level Bayesian Optimization searches the mechanical design space, while a lower-level Reinforcement Learning trains a motor control policy for each candidate design. The resulting design-policy pairs not only outperform human-engineered baselines, but also exhibit versatile behaviors such as hockey stop (rapid braking by turning sideways to maximize friction) and self-aligning motion (automatic reorientation to improve energy efficiency in the direction of travel), offering the first system-level study of dynamic skating motion on quadrupedal robots.
Publisher DOI
Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning
arXiv (Cornell University) · 2026-03-19
preprintOpen accessSenior author
Recent work in reinforcement learning has shown that incorporating structural priors for articulated robots, such as link connectivity, into policy networks improves learning efficiency. However, dynamics properties, despite their fundamental role in determining how forces and motion propagate through the body, remain largely underexplored as an inductive bias for policy learning. To address this gap, we present the Articulated-Body Dynamics Network (ABD-Net), a novel graph neural network architecture grounded in the computational structure of forward dynamics. Specifically, we adapt the inertia propagation mechanism from the Articulated Body Algorithm, systematically aggregating inertial quantities from child to parent links in a tree-structured manner, while replacing physical quantities with learnable parameters. Embedding ABD-NET into the policy actor enables dynamics-informed representations that capture how actions propagate through the body, leading to efficient and robust policy learning. Through experiments with simulated humanoid, quadruped, and hopper robots, our approach demonstrates increased sample efficiency and generalization to dynamics shifts compared to transformer-based and GNN baselines. We further validate the learned policy on real Unitree G1 and Go2 robots, state-of-the-art humanoid and quadruped platforms, generating dynamic, versatile and robust locomotion behaviors through sim-to-real transfer with real-time inference.
Publisher DOI
Articulated-Body Dynamics Network: Dynamics-Grounded Prior for Robot Learning
ArXiv.org · 2026-03-19
articleOpen accessSenior author
Recent work in reinforcement learning has shown that incorporating structural priors for articulated robots, such as link connectivity, into policy networks improves learning efficiency. However, dynamics properties, despite their fundamental role in determining how forces and motion propagate through the body, remain largely underexplored as an inductive bias for policy learning. To address this gap, we present the Articulated-Body Dynamics Network (ABD-Net), a novel graph neural network architecture grounded in the computational structure of forward dynamics. Specifically, we adapt the inertia propagation mechanism from the Articulated Body Algorithm, systematically aggregating inertial quantities from child to parent links in a tree-structured manner, while replacing physical quantities with learnable parameters. Embedding ABD-NET into the policy actor enables dynamics-informed representations that capture how actions propagate through the body, leading to efficient and robust policy learning. Through experiments with simulated humanoid, quadruped, and hopper robots, our approach demonstrates increased sample efficiency and generalization to dynamics shifts compared to transformer-based and GNN baselines. We further validate the learned policy on real Unitree G1 and Go2 robots, state-of-the-art humanoid and quadruped platforms, generating dynamic, versatile and robust locomotion behaviors through sim-to-real transfer with real-time inference.
Publisher OA PDF
Efficient and Versatile Quadrupedal Skating: Optimal Co-design via Reinforcement Learning and Bayesian Optimization
ArXiv.org · 2026-03-19
articleOpen access
In this paper, we present a hardware-control co-design approach that enables efficient and versatile roller skating on quadrupedal robots equipped with passive wheels. Passive-wheel skating reduces leg inertia and improves energy efficiency, particularly at high speeds. However, the absence of direct wheel actuation tightly couples mechanical design and control. To unlock the full potential of this modality, we formulate a bilevel optimization framework: an upper-level Bayesian Optimization searches the mechanical design space, while a lower-level Reinforcement Learning trains a motor control policy for each candidate design. The resulting design-policy pairs not only outperform human-engineered baselines, but also exhibit versatile behaviors such as hockey stop (rapid braking by turning sideways to maximize friction) and self-aligning motion (automatic reorientation to improve energy efficiency in the direction of travel), offering the first system-level study of dynamic skating motion on quadrupedal robots.
Publisher OA PDF
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
University of Birmingham Research Portal (University of Birmingham) · 2025-05-28
preprintOpen access
This paper studies off-policy evaluation (OPE) in reinforcement learning with a focus on behavior policy estimation for importance sampling. Prior work has shown empirically that estimating a history-dependent behavior policy can lead to lower mean squared error (MSE) even when the true behavior policy is Markovian. However, the question of why the use of history should lower MSE remains open. In this paper, we theoretically demystify this paradox by deriving a bias-variance decomposition of the MSE of ordinary importance sampling (IS) estimators, demonstrating that history-dependent behavior policy estimation decreases their asymptotic variances while increasing their finite-sample biases. Additionally, as the estimated behavior policy conditions on a longer history, we show a consistent decrease in variance. We extend these findings to a range of other OPE estimators, including the sequential IS estimator, the doubly robust estimator and the marginalized IS estimator, with the behavior policy estimated either parametrically or non-parametrically.
Publisher OA PDF DOI
Adversarial Agents: Black-Box Evasion Attacks with Reinforcement Learning
ArXiv.org · 2025-03-03
preprintOpen access
Attacks on machine learning models have been extensively studied through stateless optimization. In this paper, we demonstrate how a reinforcement learning (RL) agent can learn a new class of attack algorithms that generate adversarial samples. Unlike traditional adversarial machine learning (AML) methods that craft adversarial samples independently, our RL-based approach retains and exploits past attack experience to improve the effectiveness and efficiency of future attacks. We formulate adversarial sample generation as a Markov Decision Process and evaluate RL's ability to (a) learn effective and efficient attack strategies and (b) compete with state-of-the-art AML. On two image classification benchmarks, our agent increases attack success rate by up to 13.2% and decreases the average number of victim model queries per attack by up to 16.9% from the start to the end of training. In a head-to-head comparison with state-of-the-art image attacks, our approach enables an adversary to generate adversarial samples with 17% more success on unseen inputs post-training. From a security perspective, this work demonstrates a powerful new attack vector that uses RL to train agents that attack ML models efficiently and at scale.
Publisher OA PDF DOI
Reinforcement Learning Within the Classical Robotics Stack: A Case Study in Robot Soccer
2025-05-19
articleSenior author
Robot decision-making in partially observable, real-time, dynamic, and multi-agent environments remains a difficult and unsolved challenge. Model-free reinforcement learning (RL) is a promising approach to learning decisionmaking in such domains, however, end-to-end RL in complex environments is often intractable. To address this challenge in the RoboCup Standard Platform League (SPL) domain, we developed a novel architecture integrating RL within a classical robotics stack, while employing a multi-fidelity sim2real approach and decomposing behavior into learned sub-behaviors with heuristic selection. Our architecture led to victory in the 2024 RoboCup SPL Challenge Shield Division. In this work, we fully describe our system's architecture and empirically analyze key design decisions that contributed to its success. Our approach demonstrates how RL-based behaviors can be integrated into complete robot behavior architectures.
Publisher DOI
Multi-Robot Collaboration Through Reinforcement Learning and Abstract Simulation
2025-05-19
articleSenior author
Publisher DOI
Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
ArXiv.org · 2025-08-01
preprintOpen accessSenior author
Independent on-policy policy gradient algorithms are widely used for multi-agent reinforcement learning (MARL) in cooperative and no-conflict games, but they are known to converge sub-optimally when each agent's individual policy gradient points away from an optimal joint equilibrium. Going beyond prior work, we observe that sub-optimal convergence can still arise even when the expected individual policy gradients of each agent point toward the optimal joint solution. After collecting a finite set of trajectories, stochasticity in independent action sampling can cause the joint data distribution to deviate from the expected joint on-policy distribution. This \textit{sampling error} w.r.t. the joint on-policy distribution produces inaccurate gradient estimates that can make agents converge sub-optimally. We hypothesize that joint sampling error can be reduced through coordinated action selection and that doing so will increase the reliability of policy gradient learning in MARL (i.e., the probability of converging to an optimal joint policy). To test this hypothesis, we first introduce an adaptive action sampling approach to reduce joint sampling error in the Centralized Training with Decentralized Execution setting. Our method, Cooperative Sampling Error Reduction (CoSER), continually adapts a centralized behavior policy to place higher probability on joint actions that are under-sampled w.r.t. the current joint policy. We then empirically evaluate CoSER on a diverse set of multi-agent games and demonstrate that (1) CoSER reduces joint sampling error more efficiently than independent on-policy sampling and (2) this reduction increases the reliability of independent policy gradient algorithms.
Publisher OA PDF DOI
Reinforcement Learning via Auxiliary Task Distillation
Lecture notes in computer science · 2024-10-31
book-chapter
Publisher DOI

Frequent coauthors

Peter Stone
41 shared
Stefano V. Albrecht
15 shared
Garrett Warnell
15 shared
Guni Sharon
Texas A&M University
14 shared
Brahma S. Pavse
12 shared
Haresh Karnan
11 shared
Siddharth Desai
University of Kentucky
11 shared
Scott Niekum
University of Massachusetts Amherst
11 shared

Labs

Josiah Hanna LabPI
Students

Education

B.S., Computer Science and Mathematics
University of Kentucky
Ph.D., Computer Science
University of Texas at Austin

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Josiah Hanna

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you