
Alan Yuille
· Bloomberg Distinguished ProfessorVerifiedJohns Hopkins University · Radiology and Radiological Science
Active 1966–2026
About
Alan Yuille is a Bloomberg Distinguished Professor of Cognitive Science and Computer Science at Johns Hopkins University, holding joint primary appointments in these departments. His research interests include computational models of vision, mathematical models of cognition, medical image analysis, artificial intelligence, and neural networks. Dr. Yuille's work spans several disciplines, including computer vision, vision science, and neuroscience. He directs the research group on Computational Cognition, Vision, and Learning (CCVL) and is affiliated with the Center for Brains, Minds and Machines, as well as the NSF Expedition in Computing, Visual Cortex on Silicon. He received a BA degree in mathematics from the University of Cambridge in 1976 and completed his PhD in theoretical physics at Cambridge in 1981 under the supervision of Prof. S.W. Hawking. His career includes positions as a research scientist at MIT's Artificial Intelligence Laboratory and Harvard University’s Division of Applied Sciences, as well as roles as an assistant and associate professor at Harvard until 1996. He was a senior research scientist at the Smith-Kettlewell Eye Research Institute from 1996 to 2002 and served as a full professor at UCLA with joint appointments in computer science, psychiatry, and psychology. Dr. Yuille joined Johns Hopkins University in January 2016 as a Bloomberg Distinguished Professor, where he continues his research and teaching.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Artificial Intelligence
- Computer vision
- Mathematics
- Machine Learning
- Engineering
- Theoretical computer science
- Mathematical optimization
- Algorithm
- Programming language
Selected publications
IEEE Transactions on Medical Imaging · 2026-01-01 · 2 citations
articleTumor synthesis can generate challenging cases that AI often misses or over-detects. Training on these cases improves AI performance. However, most existing synthesis methods are either unconditional-generating images from random variables-or conditioned only on tumor shape. As a result, they lack control over clinically important tumor characteristics, such as texture, heterogeneity, boundary, and pathology. The generated tumors are therefore overly similar or duplicates of existing training cases, failing to effectively address AI's weaknesses. We propose a new text-driven tumor synthesis approach, termed TextoMorph, that provides textual control over tumor characteristics in conjunction with mask control. This approach is particularly beneficial for examples that confuse the AI the most, such as early tumor detection (improving Sensitivity by +6.5%), tumor segmentation for precise radiotherapy (improving NSD by +3.1%), and classification between benign and malignant tumors (improving Sensitivity by +8.2%). By incorporating text mined from radiology reports into the synthesis process, we increase the variability and controllability of the synthetic tumors to target AI's failure cases more precisely. Moreover, TextoMorph uses contrastive learning across different texts and CT scans, significantly reducing dependence on scarce image-report pairs (only 141 pairs used in this study) by leveraging a large corpus of 34,035 radiology reports. Finally, we have developed rigorous tests to evaluate synthetic tumors, showing that our synthetic tumors is realistic and diverse in texture, heterogeneity, boundary, and pathology. Code and models are available at https://github.com/MrGiovanni/TextoMorph.
arXiv (Cornell University) · 2026-04-08
articleOpen accessPhoton-counting CT (PCCT) provides superior image quality with higher spatial resolution and lower noise compared to conventional energy-integrating CT (EICT), but its limited clinical availability restricts large-scale research and clinical deployment. To bridge this gap, we propose SUMI, a simulated degradation-to-enhancement method that learns to reverse realistic acquisition artifacts in low-quality EICT by leveraging high-quality PCCT as reference. Our central insight is to explicitly model realistic acquisition degradations, transforming PCCT into clinically plausible lower-quality counterparts and learning to invert this process. The simulated degradations were validated for clinical realism by board-certified radiologists, enabling faithful supervision without requiring paired acquisitions at scale. As outcomes of this technical contribution, we: (1) train a latent diffusion model on 1,046 PCCTs, using an autoencoder first pre-trained on both these PCCTs and 405,379 EICTs from 145 hospitals to extract general CT latent features that we release for reuse in other generative medical imaging tasks; (2) construct a large-scale dataset of over 17,316 publicly available EICTs enhanced to PCCT-like quality, with radiologist-validated voxel-wise annotations of airway trees, arteries, veins, lungs, and lobes; and (3) demonstrate substantial improvements: across external data, SUMI outperforms state-of-the-art image translation methods by 15% in SSIM and 20% in PSNR, improves radiologist-rated clinical utility in reader studies, and enhances downstream top-ranking lesion detection performance, increasing sensitivity by up to 15% and F1 score by up to 10%. Our results suggest that emerging imaging advances can be systematically distilled into routine EICT using limited high-quality scans as reference.
arXiv (Cornell University) · 2026-04-08
preprintOpen accessPhoton-counting CT (PCCT) provides superior image quality with higher spatial resolution and lower noise compared to conventional energy-integrating CT (EICT), but its limited clinical availability restricts large-scale research and clinical deployment. To bridge this gap, we propose SUMI, a simulated degradation-to-enhancement method that learns to reverse realistic acquisition artifacts in low-quality EICT by leveraging high-quality PCCT as reference. Our central insight is to explicitly model realistic acquisition degradations, transforming PCCT into clinically plausible lower-quality counterparts and learning to invert this process. The simulated degradations were validated for clinical realism by board-certified radiologists, enabling faithful supervision without requiring paired acquisitions at scale. As outcomes of this technical contribution, we: (1) train a latent diffusion model on 1,046 PCCTs, using an autoencoder first pre-trained on both these PCCTs and 405,379 EICTs from 145 hospitals to extract general CT latent features that we release for reuse in other generative medical imaging tasks; (2) construct a large-scale dataset of over 17,316 publicly available EICTs enhanced to PCCT-like quality, with radiologist-validated voxel-wise annotations of airway trees, arteries, veins, lungs, and lobes; and (3) demonstrate substantial improvements: across external data, SUMI outperforms state-of-the-art image translation methods by 15% in SSIM and 20% in PSNR, improves radiologist-rated clinical utility in reader studies, and enhances downstream top-ranking lesion detection performance, increasing sensitivity by up to 15% and F1 score by up to 10%. Our results suggest that emerging imaging advances can be systematically distilled into routine EICT using limited high-quality scans as reference.
4D-Animal: Freely Reconstructing Animatable 3D Animals from Videos
2026-03-06
articleOpen accessReconstructing animatable 3D animals from videos traditionally depends on sparse semantic keypoints to fit parametric models. Acquiring these keypoints is labor-intensive, and detectors trained on limited animal datasets are often unreliable. We propose 4D-Animal, a keypoint-free framework that reconstructs animatable 3D animals directly from videos. Our method employs a dense feature network to map 2D image representations to SMAL parameters, improving both efficiency and stability. Additionally, we introduce a hierarchical alignment strategy that leverages silhouette, part-level, pixel-level, and temporal cues from pretrained 2D models, ensuring accurate and temporally coherent reconstructions. Extensive experiments demonstrate that 4D-Animal outperforms both model-based and model-free baselines on dog dataset. Moreover, the high-quality 3D assets generated by our method can benefit other 3D tasks, underscoring its potential for large-scale applications. The code is released at https://github.com/zhongshsh/4D-Animal.
A Comprehensive Survey of Agentic AI in Healthcare
2025-11-06
articleOpen accessThe rapid advancement of large language models (LLMs) has accelerated the development of agentic AI. This paradigm carries significant implications for healthcare, an ecosystem characterized by its knowledge intensity and complex decision-making requirements. At the same time, the high-stakes and safety-critical nature of healthcare poses unique challenges, making general-purpose agentic frameworks often inadequate. The autonomy that defines these agents introduces additional concerns regarding trust, reliability, and alignment with clinical constraints. With research in this area growing exponentially over the past two years, this comprehensive survey provides a systematic map, analyzing over 200 recent studies. We propose a holistic taxonomy that traces the full lifecycle of healthcare agents, beginning with the Perception of diverse clinical modalities. From there, it examines the core Agentic Capabilities and Architectures that enable autonomous action. We then map the Application Ecosystem by organizing use cases around the key stakeholders they serve (clinicians, patients, researchers, and administrators) and conclude with a review of Evaluation frameworks. Furthermore, we discuss the limitations of existing methods from diverse perspectives and systematically propose future research directions. To facilitate ongoing research, a curated list of all related papers is available at https://github.com/AgenticHealthAI/Awesome-Agentic-AI-for-Healthcare/.
Journal of Exposure Science & Environmental Epidemiology · 2025-08-12
articleOpen accessMixture of Contexts for Long Video Generation
ArXiv.org · 2025-08-28 · 1 citations
preprintOpen accessLong video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long range without collapsing or drifting. However, scaling diffusion transformers to generate long-context videos is fundamentally limited by the quadratic cost of self-attention, which makes memory and computation intractable and difficult to optimize for long sequences. We recast long-context video generation as an internal information retrieval task and propose a simple, learnable sparse attention routing module, Mixture of Contexts (MoC), as an effective long-term memory retrieval engine. In MoC, each query dynamically selects a few informative chunks plus mandatory anchors (caption, local windows) to attend to, with causal routing that prevents loop closures. As we scale the data and gradually sparsify the routing, the model allocates compute to salient history, preserving identities, actions, and scenes over minutes of content. Efficiency follows as a byproduct of retrieval (near-linear scaling), which enables practical training and synthesis, and the emergence of memory and consistency at the scale of minutes.
Learning Segmentation from Radiology Reports
Lecture notes in computer science · 2025-09-19 · 1 citations
book-chapterJournal of Exposure Science & Environmental Epidemiology · 2025-10-14 · 2 citations
articleOpen accessTGT: Text-Grounded Trajectories for Locally Controlled Video Generation
ArXiv.org · 2025-10-16
preprintOpen accessText-to-video generation has advanced rapidly in visual fidelity, whereas standard methods still have limited ability to control the subject composition of generated scenes. Prior work shows that adding localized text control signals, such as bounding boxes or segmentation masks, can help. However, these methods struggle in complex scenarios and degrade in multi-object settings, offering limited precision and lacking a clear correspondence between individual trajectories and visual entities as the number of controllable objects increases. We introduce Text-Grounded Trajectories (TGT), a framework that conditions video generation on trajectories paired with localized text descriptions. We propose Location-Aware Cross-Attention (LACA) to integrate these signals and adopt a dual-CFG scheme to separately modulate local and global text guidance. In addition, we develop a data processing pipeline that produces trajectories with localized descriptions of tracked entities, and we annotate two million high quality video clips to train TGT. Together, these components enable TGT to use point trajectories as intuitive motion handles, pairing each trajectory with text to control both appearance and motion. Extensive experiments show that TGT achieves higher visual quality, more accurate text alignment, and improved motion controllability compared with prior approaches. Website: https://textgroundedtraj.github.io.
Recent grants
Collaborative Research: Visual Cortex on Silicon
NSF · $750k · 2013–2017
NIH · $344k · 2015
NIH · $1.2M · 2016
NIH · $1.1M · 2002
NIH · $996k · 2006
Frequent coauthors
- 103 shared
Adam Kortylewski
University of Freiburg
- 100 shared
Elliot K. Fishman
Johns Hopkins University
- 87 shared
Yuyin Zhou
University of California, Santa Cruz
- 83 shared
Wei Shen
- 81 shared
Cihang Xie
- 81 shared
Lingxi Xie
- 53 shared
Siyuan Qiao
- 53 shared
Weichao Qiu
Huizhou University
Labs
Education
- 1986
Ph.D., Applied Mathematics and Theoretical Physics
University of Cambridge
- 1981
Post Doc Fellow (N.A.T.O.), Theoretical Physics
The University of Texas at Austin
- 1981
Post Doc Fellow (N.A.T.O.), theoretical Physics
University of California Santa Barbara
- 1977
Distinction Part 3, Mathematics Tripos
University of Cambridge
- 1976
B.A., Mathematics
University of Cambridge
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Alan Yuille
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup