Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Jim Rehg

· Professor, Director of the Health Care Engineering Systems CenterVerified

University of Illinois Urbana-Champaign · Industrial and Enterprise Systems Engineering

Active 1992–2026

h-index74
Citations23.9k
Papers429118 last 5y
Funding$37.4M1 active
See your match with Jim Rehg — sign in to PhdFit.Sign in

About

Jim Rehg is a professor and the Director of the Health Care Engineering Systems Center at the University of Illinois Urbana-Champaign. His research areas include human factors and health technology, with recent courses such as CS 598 CVH - Computer Vision for Health. Rehg develops computational tools for health-related behaviors and has contributed to the fields of health technology and human factors research. He is actively involved in advancing health-related research and education within the Grainger College of Engineering.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Human–computer interaction
  • Machine Learning
  • Multimedia
  • Computer vision
  • Psychology

Selected publications

  • IAM: Identity-Aware Human Motion and Shape Joint Generation

    arXiv (Cornell University) · 2026-04-28

    preprintOpen access

    Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions. We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics. Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality. Project page: https://vjwq.github.io/IAM

  • IAM: Identity-Aware Human Motion and Shape Joint Generation

    ArXiv.org · 2026-04-28

    articleOpen access

    Recent advances in text-driven human motion generation enable models to synthesize realistic motion sequences from natural language descriptions. However, most existing approaches assume identity-neutral motion and generate movements using a canonical body representation, ignoring the strong influence of body morphology on motion dynamics. In practice, attributes such as body proportions, mass distribution, and age significantly affect how actions are performed, and neglecting this coupling often leads to physically inconsistent motions. We propose an identity-aware motion generation framework that explicitly models the relationship between body morphology and motion dynamics. Instead of relying on explicit geometric measurements, identity is represented using multimodal signals, including natural language descriptions and visual cues. We further introduce a joint motion-shape generation paradigm that simultaneously synthesizes motion sequences and body shape parameters, allowing identity cues to directly modulate motion dynamics. Extensive experiments on motion capture datasets and large-scale in-the-wild videos demonstrate improved motion realism and motion-identity consistency while maintaining high motion quality. Project page: https://vjwq.github.io/IAM

  • Naturalistic Language Recordings Reveal “Hypervocal” Infants at High Familial Risk for Autism

    UNC Libraries · 2026-02-07

    articleOpen access

    Children's early language environments are related to later development. Little is known about this association in siblings of children with autism spectrum disorder (ASD), who often experience language delays or have ASD. Fifty-nine 9-month-old infants at high or low familial risk for ASD contributed full-day in-home language recordings. High-risk infants produced more vocalizations than low-risk peers; conversational turns and adult words did not differ by group. Vocalization differences were driven by a subgroup of "hypervocal" infants. Despite more vocalizations overall, these infants engaged in less social babbling during a standardized clinic assessment, and they experienced fewer conversational turns relative to their rate of vocalizations. Two ways in which these individual and environmental differences may relate to subsequent development are discussed.

  • Narrative-Driven Paper-to-Slide Generation via ArcDeck

    ArXiv.org · 2026-04-13

    articleOpen accessSenior author

    We introduce ArcDeck, a multi-agent framework that formulates paper-to-slide generation as a structured narrative reconstruction task. Unlike existing methods that directly summarize raw text into slides, ArcDeck explicitly models the source paper's logical flow. It first parses the input to construct a discourse tree and establish a global commitment document, ensuring the high-level intent is preserved. These structural priors then guide an iterative multi-agent refinement process, where specialized agents iteratively critique and revise the presentation outline before rendering the final visual layouts and designs. To evaluate our approach, we also introduce ArcBench, a newly curated benchmark of academic paper-slide pairs. Experimental results demonstrate that explicit discourse modeling, combined with role-specific agent coordination, significantly improves the narrative flow and logical coherence of the generated presentations.

  • EgoForge: Goal-Directed Egocentric World Simulator

    arXiv (Cornell University) · 2026-03-20

    preprintOpen access

    Generative world models have shown promise for simulating dynamic environments, yet egocentric video remains challenging due to rapid viewpoint changes, frequent hand-object interactions, and goal-directed procedures whose evolution depends on latent human intent. Existing approaches either focus on hand-centric instructional synthesis with limited scene evolution, perform static view translation without modeling action dynamics, or rely on dense supervision, such as camera trajectories, long video prefixes, synchronized multicamera capture, etc. In this work, we introduce EgoForge, an egocentric goal-directed world simulator that generates coherent, first-person video rollouts from minimal static inputs: a single egocentric image, a high-level instruction, and an optional auxiliary exocentric view. To improve intent alignment and temporal consistency, we propose VideoDiffusionNFT, a trajectory-level reward-guided refinement that optimizes goal completion, temporal causality, scene consistency, and perceptual fidelity during diffusion sampling. Extensive experiments show EgoForge achieves consistent gains in semantic alignment, geometric stability, and motion fidelity over strong baselines, and robust performance in real-world smart-glasses experiments.

  • Narrative-Driven Paper-to-Slide Generation via ArcDeck

    arXiv (Cornell University) · 2026-04-13

    preprintOpen accessSenior author

    We introduce ArcDeck, a multi-agent framework that formulates paper-to-slide generation as a structured narrative reconstruction task. Unlike existing methods that directly summarize raw text into slides, ArcDeck explicitly models the source paper's logical flow. It first parses the input to construct a discourse tree and establish a global commitment document, ensuring the high-level intent is preserved. These structural priors then guide an iterative multi-agent refinement process, where specialized agents iteratively critique and revise the presentation outline before rendering the final visual layouts and designs. To evaluate our approach, we also introduce ArcBench, a newly curated benchmark of academic paper-slide pairs. Experimental results demonstrate that explicit discourse modeling, combined with role-specific agent coordination, significantly improves the narrative flow and logical coherence of the generated presentations.

  • Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

    arXiv (Cornell University) · 2026-03-31

    preprintOpen access

    We introduce Omni-MMSI, a new task that requires comprehensive social interaction understanding from raw audio, vision, and speech input. The task involves perceiving identity-attributed social cues (e.g., who is speaking what) and reasoning about the social interaction (e.g., whom the speaker refers to). This task is essential for developing AI assistants that can perceive and respond to human interactions. Unlike prior studies that operate on oracle-preprocessed social cues, Omni-MMSI reflects realistic scenarios where AI assistants must perceive and reason from raw data. However, existing pipelines and multi-modal LLMs perform poorly on Omni-MMSI because they lack reliable identity attribution capabilities, which leads to inaccurate social interaction understanding. To address this challenge, we propose Omni-MMSI-R, a reference-guided pipeline that produces identity-attributed social cues with tools and conducts chain-of-thought social reasoning. To facilitate this pipeline, we construct participant-level reference pairs and curate reasoning annotations on top of the existing datasets. Experiments demonstrate that Omni-MMSI-R outperforms advanced LLMs and counterparts on Omni-MMSI. Project page: https://sampson-lee.github.io/omni-mmsi-project-page.

  • Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

    ArXiv.org · 2026-03-31

    articleOpen access

    We introduce Omni-MMSI, a new task that requires comprehensive social interaction understanding from raw audio, vision, and speech input. The task involves perceiving identity-attributed social cues (e.g., who is speaking what) and reasoning about the social interaction (e.g., whom the speaker refers to). This task is essential for developing AI assistants that can perceive and respond to human interactions. Unlike prior studies that operate on oracle-preprocessed social cues, Omni-MMSI reflects realistic scenarios where AI assistants must perceive and reason from raw data. However, existing pipelines and multi-modal LLMs perform poorly on Omni-MMSI because they lack reliable identity attribution capabilities, which leads to inaccurate social interaction understanding. To address this challenge, we propose Omni-MMSI-R, a reference-guided pipeline that produces identity-attributed social cues with tools and conducts chain-of-thought social reasoning. To facilitate this pipeline, we construct participant-level reference pairs and curate reasoning annotations on top of the existing datasets. Experiments demonstrate that Omni-MMSI-R outperforms advanced LLMs and counterparts on Omni-MMSI. Project page: https://sampson-lee.github.io/omni-mmsi-project-page.

  • How Much 3D Do Video Foundation Models Encode?

    arXiv (Cornell University) · 2025-12-23

    preprintOpen accessSenior author

    Videos are continuous 2D projections of 3D worlds. After training on large video data, will global 3D understanding naturally emerge? We study this by quantifying the 3D understanding of existing Video Foundation Models (VidFMs) pretrained on vast video data. We propose the first model-agnostic framework that measures the 3D awareness of various VidFMs by estimating multiple 3D properties from their features via shallow read-outs. Our study presents meaningful findings regarding the 3D awareness of VidFMs on multiple axes. In particular, we show that state-of-the-art video generation models exhibit a strong understanding of 3D objects and scenes, despite not being trained on any 3D data. Such understanding can even surpass that of large expert models specifically trained for 3D tasks. Our findings, together with the 3D benchmarking of major VidFMs, provide valuable observations for building scalable 3D models.

  • Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

    International Journal of Computer Vision · 2025-11-24 · 1 citations

    article

Recent grants

Frequent coauthors

  • Miao Liu

    Shandong University

    31 shared
  • Agata Rozga

    Georgia Institute of Technology

    27 shared
  • Stefan Stojanov

    26 shared
  • Eunji Chong

    Amazon (Germany)

    24 shared
  • Audrey Southerland

    Georgia Institute of Technology

    24 shared
  • Fiona Ryan

    Georgia Institute of Technology

    23 shared
  • Yin Li

    Southwest University

    23 shared
  • Anh Thai

    21 shared

Education

  • Ph.D., Computer Science

    University of California, Berkeley

    1991
  • M.S., Computer Science

    University of California, Berkeley

    1986
  • B.S., Computer Science

    University of California, Santa Barbara

    1983
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jim Rehg

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup