Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Romit  Roy Choudhury

Romit Roy Choudhury

· Professor, Electrical and Computer Engineering

University of Illinois Urbana-Champaign · Computer Science

Active 2002–2026

h-index51
Citations11.4k
Papers20119 last 5y
Funding$3.7M
See your match with Romit Roy Choudhury — sign in to PhdFit.Sign in

About

Romit Roy Choudhury is a professor in the Electrical and Computer Engineering (ECE) and Computer Science (CS) departments at the University of Illinois. He holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign, earned in 2006. His research interests include generative models, inverse problems, neural imaging such as NeRFs, blackbox optimization, wireless sensing, and signal processing. He has held various professional positions, including Amazon Scholar since 2022, Visiting Principal Scientist at Samsung AI Center in Cambridge, UK, in Fall 2019, and has been a faculty member at the University of Illinois since August 2017. Prior to that, he was an associate professor at Duke University and held research positions at Microsoft Research and Intel. His research group focuses on advancing artificial intelligence, systems, and networking, with an emphasis on innovative imaging and sensing technologies. He has been recognized for teaching excellence, being listed on the campus list of teachers ranked as excellent by students in multiple recent years.

Research topics

  • Computer Science
  • Human–computer interaction
  • Speech recognition
  • Acoustics
  • Data science
  • Embedded system
  • Telecommunications
  • Computer vision
  • Business
  • Engineering
  • Computer network
  • Physics

Selected publications

  • Personalized Image Generation via Human-in-the-loop Bayesian Optimization

    Open MIND · 2026-02-02

    preprintSenior author

    Imagine Alice has a specific image $x^\ast$ in her mind, say, the view of the street in which she grew up during her childhood. To generate that exact image, she guides a generative model with multiple rounds of prompting and arrives at an image $x^{p*}$. Although $x^{p*}$ is reasonably close to $x^\ast$, Alice finds it difficult to close that gap using language prompts. This paper aims to narrow this gap by observing that even after language has reached its limits, humans can still tell when a new image $x^+$ is closer to $x^\ast$ than $x^{p*}$. Leveraging this observation, we develop MultiBO (Multi-Choice Preferential Bayesian Optimization) that carefully generates $K$ new images as a function of $x^{p*}$, gets preferential feedback from the user, uses the feedback to guide the diffusion model, and ultimately generates a new set of $K$ images. We show that within $B$ rounds of user feedback, it is possible to arrive much closer to $x^\ast$, even though the generative model has no information about $x^\ast$. Qualitative scores from $30$ users, combined with quantitative metrics compared across $5$ baselines, show promising results, suggesting that multi-choice feedback from humans can be effectively harnessed for personalized image generation.

  • Unified Diffusion Refinement for Multi-Channel Speech Enhancement and Separation

    ArXiv.org · 2026-03-25

    articleOpen accessSenior author

    We propose Uni-ArrayDPS, a novel diffusion-based refinement framework for unified multi-channel speech enhancement and separation. Existing methods for multi-channel speech enhancement/separation are mostly discriminative and are highly effective at producing high-SNR outputs. However, they can still generate unnatural speech with non-linear distortions caused by the neural network and regression-based objectives. To address this issue, we propose Uni-ArrayDPS, which refines the outputs of any strong discriminative model using a speech diffusion prior. Uni-ArrayDPS is generative, array-agnostic, and training-free, and supports both enhancement and separation. Given a discriminative model's enhanced/separated speech, we use it, together with the noisy mixtures, to estimate the noise spatial covariance matrix (SCM). We then use this SCM to compute the likelihood required for diffusion posterior sampling of the clean speech source(s). Uni-ArrayDPS requires only a pre-trained clean-speech diffusion model as a prior and does not require additional training or fine-tuning, allowing it to generalize directly across tasks (enhancement/separation), microphone array geometries, and discriminative model backbones. Extensive experiments show that Uni-ArrayDPS consistently improves a wide range of discriminative models for both enhancement and separation tasks. We also report strong results on a real-world dataset. Audio demos are provided at \href{https://xzwy.github.io/Uni-ArrayDPS/}{https://xzwy.github.io/Uni-ArrayDPS/}.

  • Discrete Langevin-Inspired Posterior Sampling

    ArXiv.org · 2026-05-10

    articleOpen accessSenior author

    We study posterior sampling for inverse problems in discrete state spaces using discrete diffusion models as generative priors. While continuous diffusion models have become widely used for inverse problems, their discrete counterparts remain comparatively underexplored. Existing discrete posterior samplers often rely on continuous relaxations of discrete variables, Gibbs-style updates, or mechanisms specialized to particular corruption processes, which can limit scalability or generality. We propose $Δ$LPS, a Discrete Langevin-Inspired Posterior Sampler that uses gradient information to identify promising discrete moves without leaving the discrete state space. The resulting approach enables efficient parallel updates across all token dimensions and is agnostic to the training paradigm of the discrete diffusion prior, including masked and uniform-state diffusion. We evaluate our method on image restoration tasks across MNIST, CIFAR, and FFHQ, as well as spatial mapping, covering linear, nonlinear, and blind inverse problems. Across these settings, we improve over recent discrete diffusion posterior samplers and are competitive with strong continuous diffusion-based inverse solvers. Our results suggest that fully discrete, gradient-informed posterior samplers offer a scalable and general path toward solving inverse problems over discrete representations.

  • Dependency-Aware Discrete Diffusion for Scene Graph Generation

    arXiv (Cornell University) · 2026-05-09

    preprintOpen accessSenior author

    Scene graphs (SGs) represent objects and their relationships as structured graphs, enabling applications in image generation, robotics, and 3D understanding. Recent work suggests that conditioning image generation on scene graphs improves compositional fidelity compared to text-only prompting. However, since users typically provide text rather than structured graphs, a key challenge is to generate scene graphs from natural language. Prior work on discrete diffusion has demonstrated success in generating generic graphs such as molecules and circuits, but fails to account for the hierarchical structure and strong dependencies between objects, edges, and relations in scene graphs. We address this limitation by introducing a dependency-aware, hierarchically constrained discrete diffusion model for scene graph generation. Our approach decouples structure and semantics across the forward and reverse processes, enabling the model to capture conditional dependencies. At inference time, we perform training-free conditioning to sample text-aligned scene graphs. We evaluate our method on standard SG benchmarks and demonstrate improvements over both continuous and discrete graph generation baselines across graph and layout metrics. When fed to downstream image generation, our approach yields improved compositional alignment compared to text-to-image models, particularly in multi-object scenarios.

  • AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking

    arXiv (Cornell University) · 2026-01-25

    preprintOpen access

    Internet audio-visual clips convey meaning through time-varying sound and motion, which extend beyond what text alone can represent. To examine whether AI models can understand such signals in human cultural contexts, we introduce AVMeme Exam, a human-curated benchmark of over one thousand iconic Internet sounds and videos spanning speech, songs, music, and sound effects. Each meme is paired with a unique Q&A assessing levels of understanding from surface content to context and emotion to usage and world knowledge, along with metadata such as original year, transcript, summary, and sensitivity. We systematically evaluate state-of-the-art multimodal large language models (MLLMs) alongside human participants using this benchmark. Our results reveal a consistent limitation: current models perform poorly on textless music and sound effects, and struggle to think in context and in culture compared to surface content. These findings highlight a key gap in human-aligned multimodal intelligence and call for models that can perceive contextually and culturally beyond the surface of what they hear and see. Project page: avmemeexam.github.io/public

  • Unified Diffusion Refinement for Multi-Channel Speech Enhancement and Separation

    arXiv (Cornell University) · 2026-03-25

    preprintOpen accessSenior author

    We propose Uni-ArrayDPS, a novel diffusion-based refinement framework for unified multi-channel speech enhancement and separation. Existing methods for multi-channel speech enhancement/separation are mostly discriminative and are highly effective at producing high-SNR outputs. However, they can still generate unnatural speech with non-linear distortions caused by the neural network and regression-based objectives. To address this issue, we propose Uni-ArrayDPS, which refines the outputs of any strong discriminative model using a speech diffusion prior. Uni-ArrayDPS is generative, array-agnostic, and training-free, and supports both enhancement and separation. Given a discriminative model's enhanced/separated speech, we use it, together with the noisy mixtures, to estimate the noise spatial covariance matrix (SCM). We then use this SCM to compute the likelihood required for diffusion posterior sampling of the clean speech source(s). Uni-ArrayDPS requires only a pre-trained clean-speech diffusion model as a prior and does not require additional training or fine-tuning, allowing it to generalize directly across tasks (enhancement/separation), microphone array geometries, and discriminative model backbones. Extensive experiments show that Uni-ArrayDPS consistently improves a wide range of discriminative models for both enhancement and separation tasks. We also report strong results on a real-world dataset. Audio demos are provided at \href{https://xzwy.github.io/Uni-ArrayDPS/}{https://xzwy.github.io/Uni-ArrayDPS/}.

  • Inferring Indoor Layouts using Audio

    2026-02-25

    articleOpen accessSenior author

    Cameras and LiDARs underlie today's established tools for inferring indoor layouts. This paper explores audio as a complementary modality for this task. Our system emits short audio beacons from a handheld device (e.g., a smartphone) and records the resulting echoes at multiple, known locations along a user's path. Given these multi-position measurements, we infer the indoor layout, in the form of a 2D floorplan, using a generative approach. Our method employs a conditional GAN (CGAN) to synthesize feasible layouts while incorporating knowledge of indoor acoustic signal propagation to regularize training and avoid overfitting. We train on large-scale, high-fidelity simulations spanning diverse geometries, materials, and noise, then evaluate zero-shot in real homes and offices with no additional training. Results show accurate 2D floorplans with strong precision and recall, demonstrating audio's promise as a robust, privacy-preserving complement to vision and LiDAR.

  • Personalized Image Generation via Human-in-the-loop Bayesian Optimization

    arXiv (Cornell University) · 2026-02-02

    articleOpen accessSenior author

    Imagine Alice has a specific image $x^\ast$ in her mind, say, the view of the street in which she grew up during her childhood. To generate that exact image, she guides a generative model with multiple rounds of prompting and arrives at an image $x^{p*}$. Although $x^{p*}$ is reasonably close to $x^\ast$, Alice finds it difficult to close that gap using language prompts. This paper aims to narrow this gap by observing that even after language has reached its limits, humans can still tell when a new image $x^+$ is closer to $x^\ast$ than $x^{p*}$. Leveraging this observation, we develop MultiBO (Multi-Choice Preferential Bayesian Optimization) that carefully generates $K$ new images as a function of $x^{p*}$, gets preferential feedback from the user, uses the feedback to guide the diffusion model, and ultimately generates a new set of $K$ images. We show that within $B$ rounds of user feedback, it is possible to arrive much closer to $x^\ast$, even though the generative model has no information about $x^\ast$. Qualitative scores from $30$ users, combined with quantitative metrics compared across $5$ baselines, show promising results, suggesting that multi-choice feedback from humans can be effectively harnessed for personalized image generation.

  • Dependency-Aware Discrete Diffusion for Scene Graph Generation

    ArXiv.org · 2026-05-09

    articleOpen accessSenior author

    Scene graphs (SGs) represent objects and their relationships as structured graphs, enabling applications in image generation, robotics, and 3D understanding. Recent work suggests that conditioning image generation on scene graphs improves compositional fidelity compared to text-only prompting. However, since users typically provide text rather than structured graphs, a key challenge is to generate scene graphs from natural language. Prior work on discrete diffusion has demonstrated success in generating generic graphs such as molecules and circuits, but fails to account for the hierarchical structure and strong dependencies between objects, edges, and relations in scene graphs. We address this limitation by introducing a dependency-aware, hierarchically constrained discrete diffusion model for scene graph generation. Our approach decouples structure and semantics across the forward and reverse processes, enabling the model to capture conditional dependencies. At inference time, we perform training-free conditioning to sample text-aligned scene graphs. We evaluate our method on standard SG benchmarks and demonstrate improvements over both continuous and discrete graph generation baselines across graph and layout metrics. When fed to downstream image generation, our approach yields improved compositional alignment compared to text-to-image models, particularly in multi-object scenarios.

  • Poster: Inferring Floorplans from Walking Trajectories via Contrastive Diffusion Guidance

    2026-02-25

    articleOpen accessSenior author

    Introduction. Consider a user walking around in her home for a few minutes. Using some sensor, e.g., a smartphone, the user's trajectory has been recorded. This trajectory is a sequence of location measurements inside the home, y = [y1, y2, … yn], along which the user has walked (shown in Fig.1). We ask, given this trajectory measurement, is it possible to infer the floorplan x of the home? The problem is non-trivial because an infinite number of floorplans are candidate solutions for the given trajectories; how can one identify the correct floorplan?We tackle this using generative diffusion models that learn realistic floorplan structures from data, and then generate a layout compatible with the observed trajectory. Our method Diff2Plan builds on DDPM [2] and shows robustness to sparse, medium, and dense trajectories. Across synthetic and real-world UWB trajectories, Diff2Plan outperforms baselines such as DPS [1] and CFG [3] and degrades gracefully when measurements are limited. The resulting capability can enable practical applications such as home digital twins, context-aware assistants, and AR/VR.

Recent grants

Frequent coauthors

  • Srihari Nelakuditi

    University of South Carolina

    37 shared
  • Souvik Sen

    Baker Hughes (Germany)

    29 shared
  • Xuan Bao

    Tianjin University

    21 shared
  • Justin Manweiler

    20 shared
  • Pier Giorgio Masci

    20 shared
  • Giovanni Donato Aquaro

    18 shared
  • Mahanth Gowda

    Pennsylvania State University

    17 shared
  • Perry Elliott

    St Bartholomew's Hospital

    16 shared

Labs

  • Siebel School of Computing and Data SciencePI

Awards & honors

  • Campus List of Teachers Ranked as Excellent by their Student…
  • Campus List of Teachers Ranked as Excellent by their Student…
  • Campus List of Teachers Ranked as Excellent by their Student…
  • Campus List of Teachers Ranked as Excellent by their Student…
  • Campus List of Teachers Ranked as Excellent by their Student…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Romit Roy Choudhury

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup