Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Narendra Ahuja

· Research Professor

University of Illinois Urbana-Champaign · Statistics and Computer Science

Active 1946–2026

h-index75
Citations31.0k
Papers54545 last 5y
Funding$711k
See your match with Narendra Ahuja — sign in to PhdFit.Sign in

About

Narendra Ahuja is the Donald Biggar Willett Professor Emeritus and Research Professor in Electrical and Computer Engineering at the University of Illinois Urbana-Champaign. His research focuses on computer vision, pattern recognition, robotics, image processing, sensors, virtual environments, and intelligent interfaces. He has introduced a new computational approach to automatically extracting the syntax of images for automated image understanding, enabling the discovery, modeling, recognition, and explanation of object categories in arbitrary image sets without supervision, and organizing these categories into taxonomies. Additionally, he developed a Fourier-based formulation for the representation and synthesis of videos of dynamic textures, analyzing dynamic textures in both spatial and temporal domains. Ahuja has co-developed interdisciplinary courses on knowledge networks and image structure, content, and depiction, and has been recognized with numerous awards including the IEEE Emanuel R. Piore Award, SPIE Technology Achievement Award, and fellowships from IEEE, ACM, SPIE, IAPR, and AAAI. His work has significantly contributed to advancing automated image understanding and dynamic texture analysis within the field of computer vision.

Research topics

  • Computer Science
  • Business
  • Systems engineering
  • Psychology
  • Data science
  • Engineering
  • Risk analysis (engineering)

Selected publications

  • Can MLLMs Find Their Way in a City? Exploring Emergent Navigation from Web-Scale Knowledge

    2026-01-01

    articleOpen access
  • Finding Distributed Object-Centric Properties in Self-Supervised Transformers

    arXiv (Cornell University) · 2026-03-27

    preprintOpen accessSenior author

    Self-supervised Vision Transformers (ViTs) like DINO show an emergent ability to discover objects, typically observed in [CLS] token attention maps of the final layer. However, these maps often contain spurious activations resulting in poor localization of objects. This is because the [CLS] token, trained on an image-level objective, summarizes the entire image instead of focusing on objects. This aggregation dilutes the object-centric information existing in the local, patch-level interactions. We analyze this by computing inter-patch similarity using patch-level attention components (query, key, and value) across all layers. We find that: (1) Object-centric properties are encoded in the similarity maps derived from all three components ($q, k, v$), unlike prior work that uses only key features or the [CLS] token. (2) This object-centric information is distributed across the network, not just confined to the final layer. Based on these insights, we introduce Object-DINO, a training-free method that extracts this distributed object-centric information. Object-DINO clusters attention heads across all layers based on the similarities of their patches and automatically identifies the object-centric cluster corresponding to all objects. We demonstrate Object-DINO's effectiveness on two applications: enhancing unsupervised object discovery (+3.6 to +12.4 CorLoc gains) and mitigating object hallucination in Multimodal Large Language Models by providing visual grounding. Our results demonstrate that using this distributed object-centric information improves downstream tasks without additional training.

  • Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

    arXiv (Cornell University) · 2026-04-30

    preprintOpen accessSenior author

    Safety trained large language models (LLMs) can often be induced to answer harmful requests through jailbreak prompts. Because we lack a robust understanding of why LLMs are susceptible to jailbreaks, future frontier models operating more autonomously in higher-stakes settings may similarly be vulnerable to such attacks. Prior work has studied jailbreak success by examining the model's intermediate representations, identifying directions in this space that causally encode concepts like harmfulness and refusal. Then, they globally explain all jailbreak attacks as attempting to reduce or strengthen these concepts (e.g., reduce harmfulness). However, different jailbreak strategies may succeed by strengthening or suppressing different intermediate concepts, and the same jailbreak strategy may not work for different harmful request categories (e.g., violence vs. cyberattack); thus, we seek to give a local explanation -- i.e., why did this specific jailbreak succeed? To address this gap, we introduce LOCA, a method that gives Local, CAusal explanations of jailbreak success by identifying a minimal set of interpretable, intermediate representation changes that causally induce model refusal on an otherwise successful jailbreak request. We evaluate LOCA on harmful original-jailbreak pairs from a large jailbreak benchmark across Gemma and Llama chat models, comparing against prior methods adapted to this setting. LOCA can successfully induce refusal by making, on average, six interpretable changes; prior work routinely fails to achieve refusal even after 20 changes. LOCA is a step toward mechanistic, local explanations of jailbreak success in LLMs. Code to be released.

  • Finding Distributed Object-Centric Properties in Self-Supervised Transformers

    ArXiv.org · 2026-03-27

    articleOpen accessSenior author

    Self-supervised Vision Transformers (ViTs) like DINO show an emergent ability to discover objects, typically observed in [CLS] token attention maps of the final layer. However, these maps often contain spurious activations resulting in poor localization of objects. This is because the [CLS] token, trained on an image-level objective, summarizes the entire image instead of focusing on objects. This aggregation dilutes the object-centric information existing in the local, patch-level interactions. We analyze this by computing inter-patch similarity using patch-level attention components (query, key, and value) across all layers. We find that: (1) Object-centric properties are encoded in the similarity maps derived from all three components ($q, k, v$), unlike prior work that uses only key features or the [CLS] token. (2) This object-centric information is distributed across the network, not just confined to the final layer. Based on these insights, we introduce Object-DINO, a training-free method that extracts this distributed object-centric information. Object-DINO clusters attention heads across all layers based on the similarities of their patches and automatically identifies the object-centric cluster corresponding to all objects. We demonstrate Object-DINO's effectiveness on two applications: enhancing unsupervised object discovery (+3.6 to +12.4 CorLoc gains) and mitigating object hallucination in Multimodal Large Language Models by providing visual grounding. Our results demonstrate that using this distributed object-centric information improves downstream tasks without additional training.

  • RigMo: Unifying Rig and Motion Learning for Generative Animation

    ArXiv.org · 2026-01-10

    articleOpen access

    Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.

  • VideoMind: Thinking in Steps for Long Video Understanding

    2026-01-01

    articleOpen access

    Shubhang Bhatnagar, Renxiong Wang, Kapil Krishnakumar, Adel Ahmadyan, Zhaojiang Lin, Lambert Mathias, Xin Luna Dong, Babak Damavandi, Narendra Ahuja, Seungwhan Moon. Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track). 2026.

  • Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models

    arXiv (Cornell University) · 2026-04-30

    articleOpen accessSenior author

    Safety trained large language models (LLMs) can often be induced to answer harmful requests through jailbreak prompts. Because we lack a robust understanding of why LLMs are susceptible to jailbreaks, future frontier models operating more autonomously in higher-stakes settings may similarly be vulnerable to such attacks. Prior work has studied jailbreak success by examining the model's intermediate representations, identifying directions in this space that causally encode concepts like harmfulness and refusal. Then, they globally explain all jailbreak attacks as attempting to reduce or strengthen these concepts (e.g., reduce harmfulness). However, different jailbreak strategies may succeed by strengthening or suppressing different intermediate concepts, and the same jailbreak strategy may not work for different harmful request categories (e.g., violence vs. cyberattack); thus, we seek to give a local explanation -- i.e., why did this specific jailbreak succeed? To address this gap, we introduce LOCA, a method that gives Local, CAusal explanations of jailbreak success by identifying a minimal set of interpretable, intermediate representation changes that causally induce model refusal on an otherwise successful jailbreak request. We evaluate LOCA on harmful original-jailbreak pairs from a large jailbreak benchmark across Gemma and Llama chat models, comparing against prior methods adapted to this setting. LOCA can successfully induce refusal by making, on average, six interpretable changes; prior work routinely fails to achieve refusal even after 20 changes. LOCA is a step toward mechanistic, local explanations of jailbreak success in LLMs. Code to be released.

  • RigMo: Unifying Rig and Motion Learning for Generative Animation

    arXiv (Cornell University) · 2026-01-10

    preprintOpen access

    Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.

  • PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling

    2025-10-19 · 1 citations

    articleSenior author
  • Language-driven Description Generation and Common Sense Reasoning for Video Action Recognition

    ArXiv.org · 2025-06-20

    preprintOpen accessSenior author

    Recent video action recognition methods have shown excellent performance by adapting large-scale pre-trained language-image models to the video domain. However, language models contain rich common sense priors - the scene contexts that humans use to constitute an understanding of objects, human-object interactions, and activities - that have not been fully exploited. In this paper, we introduce a framework incorporating language-driven common sense priors to identify cluttered video action sequences from monocular views that are often heavily occluded. We propose: (1) A video context summary component that generates candidate objects, activities, and the interactions between objects and activities; (2) A description generation module that describes the current scene given the context and infers subsequent activities, through auxiliary prompts and common sense reasoning; (3) A multi-modal activity recognition head that combines visual and textual cues to recognize video actions. We demonstrate the effectiveness of our approach on the challenging Action Genome and Charades datasets.

Recent grants

Frequent coauthors

  • Ming–Hsuan Yang

    47 shared
  • Thomas S. Huang

    34 shared
  • Bernard Ghanem

    King Abdullah University of Science and Technology

    33 shared
  • Juyang Weng

    29 shared
  • Jia‐Bin Huang

    21 shared
  • John M. Hart

    University of Illinois Urbana-Champaign

    18 shared
  • Qingxiong Yang

    17 shared
  • Tianzhu Zhang

    17 shared

Education

  • Ph.D., Electrical Engineering

    University of California, Berkeley

    1986
  • M.S., Electrical Engineering

    University of California, Berkeley

    1981
  • B.S., Electrical Engineering

    Indian Institute of Technology, Kanpur

    1977

Awards & honors

  • Best Paper Award from IEEE Transactions on Multimedia, 2006
  • Associate in the Center for Advanced Study, 2005-06
  • On Incomplete List of Teachers Ranked Excellent by Their Stu…
  • 1999 UIUC Campus Award for Guiding Undergraduate Research -…
  • 1999 Donald Biggar Willet Professorship of UIUC College of E…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Narendra Ahuja

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup