Gregory D. Hager

· Mandell Bellmore ProfessorVerified

Johns Hopkins University · Radiology and Radiological Science

Active 1963–2026

h-index74

Citations27.4k

Papers655125 last 5y

Funding$12.1M1 active

Faculty page Lab page Website

See your match with Gregory D. Hager — sign in to PhdFit.Sign in

About

Gregory D. Hager is the Mandell Bellmore Professor of Computer Science at Johns Hopkins University, with joint appointments in the Department of Electrical and Computer Engineering and the Department of Mechanical Engineering. He is renowned for his research in collaborative and vision-based robotics, time-series analysis of image data, and medical applications of image analysis and robotics. Hager has published over 300 articles and books on these topics and is the founding director of the Johns Hopkins Malone Center for Engineering in Healthcare, an interdisciplinary research center focused on developing innovative healthcare technology and systems. His significant contributions to vision-based robotics have earned him recognition as an IEEE Fellow, as well as fellowships from the MICCAI Society, the Association of Computing Machinery (ACM), the American Institute for Medical and Biological Engineering (AIMBE), and the American Association for the Advancement of Science (AAAS). In 2014, he was awarded a Hans Fischer Fellowship at the Technical University of Munich’s Institute of Advanced Study, where he also holds an appointment in computer science. Hager co-founded two startups: Clear Guide Medical, which provides a platform for more accurate ultrasound-guided procedures, and Ready Robotics, which aims to make industrial robots easier to use. He earned his BA in mathematics and computer science summa cum laude from Luther College in 1983, followed by an MS in 1986 and a PhD in 1988 from the University of Pennsylvania. He was a Fulbright Fellow at the University of Karlsruhe and served on the faculty of Yale University before joining Johns Hopkins in 1999. He has held leadership roles including deputy director of the NSF Engineering Research Center for Surgical Systems and Technology and chair of the Department of Computer Science from 2010 to 2015. Hager leads the Computational Interaction and Robotics Lab (CIRL), which investigates dynamic, spatial interaction at the intersection of imaging, robotics, and human-computer interaction. His work in real-time computer vision algorithms and their applications in robotics has enabled advances in automated surgical training, medical imaging and diagnostics, and computer-enhanced interventional medicine. CIRL is affiliated with the NSF Engineering Research Center for Computer-Integrated Surgical Systems and Technology and the Laboratory for Computational Sensing and Robotics.

Research topics

Computer Science
Artificial Intelligence
Engineering
Computer vision
Human–computer interaction
Simulation
Machine Learning
World Wide Web
Pathology
Knowledge management
Medicine
Data science
Management science
Mathematics

Selected publications

Humanoid Robots as First Assistants in Endoscopic Surgery
ArXiv.org · 2026-02-27
articleOpen access
Humanoid robots have become a focal point of technological ambition, with claims of surgical capability within years in mainstream discourse. These projections are aspirational yet lack empirical grounding. To date, no humanoid has assisted a surgeon through an actual procedure, let alone performed one. The work described here breaks this new ground. Here we report a proof of concept in which a teleoperated Unitree G1 provided endoscopic visualization while an attending otolaryngologist performed a cadaveric sphenoidectomy. The procedure was completed successfully, with stable visualization maintained throughout. Teleoperation allowed assessment of whether the humanoid form factor could meet the physical demands of surgical assistance in terms of sustenance and precision; the cognitive demands were satisfied -- for now -- by the operator. Post-procedure analysis identified engineering targets for clinical translation, alongside near-term opportunities such as autonomous diagnostic scoping. This work establishes form-factor feasibility for humanoid surgical assistance while identifying challenges for continued development.
Publisher OA PDF
Humanoid Robots as First Assistants in Endoscopic Surgery
Open MIND · 2026-02-27
preprint
Humanoid robots have become a focal point of technological ambition, with claims of surgical capability within years in mainstream discourse. These projections are aspirational yet lack empirical grounding. To date, no humanoid has assisted a surgeon through an actual procedure, let alone performed one. The work described here breaks this new ground. Here we report a proof of concept in which a teleoperated Unitree G1 provided endoscopic visualization while an attending otolaryngologist performed a cadaveric sphenoidectomy. The procedure was completed successfully, with stable visualization maintained throughout. Teleoperation allowed assessment of whether the humanoid form factor could meet the physical demands of surgical assistance in terms of sustenance and precision; the cognitive demands were satisfied -- for now -- by the operator. Post-procedure analysis identified engineering targets for clinical translation, alongside near-term opportunities such as autonomous diagnostic scoping. This work establishes form-factor feasibility for humanoid surgical assistance while identifying challenges for continued development.
DOI
Final Report, Center for Computer-Integrated Computer-Integrated Surgical Systems and Technology, NSF ERC Cooperative Agreement EEC9731748, Volume 1
ArXiv.org · 2026-04-07
articleOpen access
In the last ten years, medical robotics has moved from the margins to the mainstream. Since the Engineering Research Center for Computer-Integrated Surgical Systems and Technology was Launched in 1998 with National Science Foundation funding, medical robots have been promoted from handling routine tasks to performing highly sophisticated interventions and related assignments. The CISST ERC has played a significant role in this transformation. And thanks to NSF support, the ERC has built the professional infrastructure that will continue our mission: bringing data and technology together in clinical systems that will dramatically change how surgery and other procedures are done. The enhancements we envision touch virtually every aspect of the delivery of care: - More accurate procedures - More consistent, predictable results from one patient to the next - Improved clinical outcomes - Greater patient safety - Reduced liability for healthcare providers - Lower costs for everyone - patients, facilities, insurers, government - Easier, faster recovery for patients - Effective new ways to treat health problems - Healthier patients, and a healthier system The basic science and engineering the ERC is developing now will yield profound benefits for all concerned about health care - from government agencies to insurers, from clinicians to patients to the general public. All will experience the healing touch of medical robotics, thanks in no small part to the work of the CISST ERC and its successors.
Publisher OA PDF
CAST: Evaluating Multi-Object Trackers with Context-Aware Switch and Transfer Scores
2026-03-06
articleSenior author
Multi-object tracking (MOT) has been a subject of intensive research for decades. Multiple standard datasets and benchmarks have been set up, and several evaluation metrics, such as MOTA, IDF1 and HOTA. These metrics have become the de facto standard for comparing and ranking trackers on standardized datasets to measure progress. In this paper, we focus on MOTA and HOTA, and present a study of cases where these metrics’ behaviors may not be desirable. In addition, we demonstrate how they might not be ideal when used as a tool to inspect a tracker’s failure cases. We point out that these issues are related to the sizes of the context windows in which they measure association quality, where MOTA is too nearsighted while HOTA can be too holistic depending on the task settings.In this paper, we rethink the familiar notion of identity switches (IDSw) proposed in MOTA, and propose a generalized version of it by introducing a context window when evaluating the ID assignment choice for each detection. We show that the proposed metric, CAST, mitigates the limitations of MOTA and HOTA, and demonstrate its usefulness when diagnosing model failures through examples. Our code and toolkit will be made available at https://github.com/bkkm78/cast.
Publisher DOI
Final Report, Center for Computer-Integrated Computer-Integrated Surgical Systems and Technology, NSF ERC Cooperative Agreement EEC9731748, Volume 1
arXiv (Cornell University) · 2026-04-07
preprintOpen access
In the last ten years, medical robotics has moved from the margins to the mainstream. Since the Engineering Research Center for Computer-Integrated Surgical Systems and Technology was Launched in 1998 with National Science Foundation funding, medical robots have been promoted from handling routine tasks to performing highly sophisticated interventions and related assignments. The CISST ERC has played a significant role in this transformation. And thanks to NSF support, the ERC has built the professional infrastructure that will continue our mission: bringing data and technology together in clinical systems that will dramatically change how surgery and other procedures are done. The enhancements we envision touch virtually every aspect of the delivery of care: - More accurate procedures - More consistent, predictable results from one patient to the next - Improved clinical outcomes - Greater patient safety - Reduced liability for healthcare providers - Lower costs for everyone - patients, facilities, insurers, government - Easier, faster recovery for patients - Effective new ways to treat health problems - Healthier patients, and a healthier system The basic science and engineering the ERC is developing now will yield profound benefits for all concerned about health care - from government agencies to insurers, from clinicians to patients to the general public. All will experience the healing touch of medical robotics, thanks in no small part to the work of the CISST ERC and its successors.
Publisher DOI
Investigating a Policy-Based Formulation for Endoscopic Camera Pose Recovery
arXiv (Cornell University) · 2026-03-20
preprintOpen access
In endoscopic surgery, surgeons continuously locate the endoscopic view relative to the anatomy by interpreting the evolving visual appearance of the intraoperative scene in the context of their prior knowledge. Vision-based navigation systems seek to replicate this capability by recovering camera pose directly from endoscopic video, but most approaches do not embody the same principles of reasoning about new frames that makes surgeons successful. Instead, they remain grounded in feature matching and geometric optimization over keyframes, an approach that has been shown to degrade under the challenging conditions of endoscopic imaging like low texture and rapid illumination changes. Here, we pursue an alternative approach and investigate a policy-based formulation of endoscopic camera pose recovery that seeks to imitate experts in estimating trajectories conditioned on the previous camera state. Our approach directly predicts short-horizon relative motions without maintaining an explicit geometric representation at inference time. It thus addresses, by design, some of the notorious challenges of geometry-based approaches, such as brittle correspondence matching, instability in texture-sparse regions, and limited pose coverage due to reconstruction failure. We evaluate the proposed formulation on cadaveric sinus endoscopy. Under oracle state conditioning, we compare short-horizon motion prediction quality to geometric baselines achieving lowest mean translation error and competitive rotational accuracy. We analyze robustness by grouping prediction windows according to texture richness and illumination change indicating reduced sensitivity to low-texture conditions. These findings suggest that a learned motion policy offers a viable alternative formulation for endoscopic camera pose recovery.
Publisher DOI
A Training-Free Approach for 3D Reconstruction from Monocular Sinus Endoscopy
Lecture notes in computer science · 2026-01-01 · 1 citations
book-chapter
Publisher DOI
325 3D Skull Base Reconstruction Using Publicly Available Foundational AI Models and Endoscope Video
Neurosurgery · 2025-03-14 · 1 citations
article
INTRODUCTION: The ablative nature of surgery means that pre-operative imaging studies lose correspondence as a case progresses, which can be problematic when accurate intraoperative navigation is required. Accurate 3D surface reconstruction from endoscopic video is a potential strategy for real-time intraoperative imaging updates without additional equipment. We have previously used traditional computational models to generate skull base reconstructions. However, they are time-consuming and require technical skills to process the video. Recent foundational AI models, like DUST3R, are an opportunity for timely, generalizable reconstructions of surgical anatomy. METHODS: We compared our previously described three-step reconstruction process with DUST3R to generate a water-tight 3D mesh of cadaveric skull base anatomy visualized using a Karl Storz Image 1 Hub HD Video Camera fitted with a 0° rigid endoscope. For DUST3R, we selected four video frames and did not perform any training or fine-tuning. RESULTS: Without endoscope calibration and using only the four input frames, DUST3R created 3D surface reconstructions in less than two minutes. This is compared to the three-step reconstruction process, which requires 8 to 12 hours to reconstruct. CONCLUSIONS: Our findings show that DUST3R, a publicly available foundational AI model, can rapidly generate 3D anatomical reconstructions from a limited set of video frames. Models like DUST3R illustrate the rapidly evolving potential of computer vision. With fine-tuning, they may represent a path toward foundational AI models that generalize across surgical procedures.
Publisher DOI
Self-Supervised Feature Detection and 3D Reconstruction for Real-Time Neuroendoscopic Guidance
IEEE Transactions on Biomedical Engineering · 2025-02-04
articleOpen access
OBJECTIVE: Transventricular approach to deep-brain targets offers direct visualization but also imparts deformation that challenges accurate neuronavigation. 3D reconstruction and registration of the endoscopic view could provide up-to-date, real-time guidance. We develop and evaluate a self-supervised feature detection method for 3D reconstruction and navigation in neuroendoscopy. METHODS: Unlabeled neuroendoscopic video data from 15 clinical cases yielding 11,527 video frames yielding 11,527 video frames were used to train a self-supervised learning method (R2D2-E) with 5-fold cross validation integrated into a simultaneous localization and mapping (SLAM) pipeline for 3D reconstruction. A series of experiments guided nominal hyperparameters selection and evaluated performance in comparison to SIFT, SURF and SuperPoint in terms of the accuracy of feature matching and 3D reconstruction. RESULTS: R2D2-E demonstrated a superior performance in feature matching and 3D reconstruction. R2D2-E features achieved a median projected error of 0.64 mm compared to 0.90 mm, 0.99 mm and 0.83 mm error for SIFT, SURF and SuperPoint, respectively. The method also improved F1 score by 14%, 25% and 22% compared to SIFT, SURF and SuperPoint, respectively. CONCLUSION: The proposed feature detection approach enables accurate, real-time 3D reconstruction in neuroendoscopy, offering robust feature detection in the presence of endoscopic artifacts and provides up-to-date navigation following soft-tissue deformation. SIGNIFICANCE: The self-supervised feature detection method advances capabilities for vision-based guidance and augmented visualization of target structures in neuroendoscopic procedures. The approach could enhance the accuracy and precision of neurosurgery to improve patient outcomes.
Publisher OA PDF DOI
Monocular Vision‐Based Endoscopic Sinus Navigation: A SLAM Driven Approach With CT Integration
Healthcare Technology Letters · 2025-01-01 · 1 citations
articleOpen access
Surgical navigation is critical in sinus surgery to enhance the surgeon's spatial awareness and improve precision, particularly around occluded critical structures. While external tracker-based navigation systems exist, vision-based solutions are preferred for being less intrusive and for enabling endoscopic image analysis to assist surgeons. However, monocular endoscopy navigation faces challenges associated with monocular reconstruction and camera pose estimation. This paper presents a proof of concept for monocular vision-based sinus navigation that utilizes only preoperative CT data and the endoscope video stream to navigate the sinus anatomy. We developed a vision-based navigation system that incorporates a SLAM algorithm to estimate the camera pose and reconstruct the 3D surface of the anatomy. Given an initial semi-automated registration, the algorithm maps the SLAM-based trajectories to the CT space while employing the reconstructed point cloud to solve for the scale interactively. The system displays the updates in the CT triplane visualization as SLAM reconstructs the scene and recovers pose information. We tested our system by performing an off-site navigation in ten recorded endoscopic video streaming generated from sequences obtained from eight cadaveric subjects, comparing the vision-based navigation to reference optical tracker pose data and obtaining translation and rotation errors of 3.2 mm and 4.9 degrees, respectively. Additionally, we performed three on-site tests of our system on two different cadaver experiments. Our work evaluates a fully integrated system that closes the loop between image-based reconstruction and CT visualization, and discusses the challenges to address to achieve clinical level surgical navigation.
Publisher OA PDF DOI

Recent grants

ITR: Modeling Synthesis and Analysis of Human-Machine Collaborative Systems
NSF · $1.1M · 2002–2008
Manipulating and Perceiving Simultaneously (MAPS) for Haptic Object Recognition
NSF · $216k · 2007–2010
CPS:Medium:Hybrid Systems for Modeling and Teaching the Language of Surgery
NSF · $1.5M · 2009–2013
Structure Induction for Manipulative and Interactive Tasks
NSF · $492k · 2006–2010
Enhanced Navigation for Endoscopic Sinus Surgery Through Video Analysis
NIH · $1.8M · 2012–2017

Frequent coauthors

Russell H. Taylor
85 shared
Masaru Ishii
Johns Hopkins Medicine
67 shared
Emad M. Boctor
Johns Hopkins University
39 shared
Austin Reiter
Meta (Israel)
37 shared
S. Swaroop Vedula
Malone University
34 shared
Chris Paxton
31 shared
Gábor Fichtinger
Queen's University
30 shared
Nassir Navab
27 shared

Education

Ph.D., Electrical Engineering and Computer Science
University of California, Berkeley
1990
B.S., Electrical Engineering and Computer Science
University of California, Berkeley
1985

Awards & honors

Hans Fischer Fellowship at the Technical University of Munic…

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Gregory D. Hager

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you