Aditi Majumder
· ProfessorVerifiedStony Brook University · Computer Science
Active 1965–2026
About
Aditi Majumder is a professor in the Department of Computer Science at UC Irvine's Donald Bren School of Information & Computer Sciences. Her research focuses on generating, capturing, representing, rendering, and interacting with synthetic and real-world images and video. She has developed a suite of mathematical models, methods, and software aimed at producing seamless images on large-scale tiled displays, addressing important problems in both scientific and entertainment fields. Her work involves correcting geometric, chromatic, and luminescent variations that arise when tiling multiple projection displays, contributing to advancements in computer graphics and vision.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Computer Science
- Political Science
- Artificial Intelligence
- Psychology
- Public relations
- Pathology
- Immunology
- Algorithm
- Biology
- Medicine
- Social psychology
- Geography
- Bioinformatics
- Meteorology
Selected publications
Evaluating Spatialized Auditory Cues for Rapid Attention Capture in XR
2026-03-21
articleOpen accessSenior authorIn time-critical eXtended reality (XR) scenarios where users must rapidly reorient their attention to hazards, alerts, or instructions while engaged in a primary task, spatial audio can provide an immediate directional cue without occupying visual bandwidth. However, such scenarios can afford only a brief auditory exposure, requiring users to interpret sound direction quickly and without extended listening or head-driven refinement. This paper reports a controlled exploratory study of rapid spatial-audio localization in XR. Using HRTF-rendered broadband stimuli presented from a semi-dense set of directions around the listener, we quantify how accurately users can infer coarse direction from brief audio alone. We further examine the effects of short-term visuo-auditory feedback training as a lightweight calibration mechanism. Our findings show that brief spatial cues can convey coarse directional information, and that even short calibration can improve users' perception of aural signals. While these results highlight the potential of spatial audio for rapid attention guidance, they also show that auditory cues alone may not provide sufficient precision for complex or high-stakes tasks, and that spatial audio may be most effective when complemented by other sensory modalities or visual cues, without relying on head-driven refinement. We leverage this study on spatial audio as a preliminary investigation into a first-stage attention-guidance channel for wearable XR (e.g., VR head-mounted displays and AR smart glasses), and provide design insights on stimulus selection and calibration for time-critical use.
IEEE Transactions on Emerging Topics in Computing · 2025-11-25
articleSenior author2D X-ray radiography is a widely used medical imaging technique due to its low cost, rapid acquisition, and low radiation exposure. However, X-ray images are limited to the 2D plane, making it difficult for physicians to visualize and assess the 3D anatomical information. We present Pixel2Voxel, a method to reconstruct 3D volume from a limited number (1 or 2) of 2D X-ray images for the visualization of 3D anatomical structures. Our approach backprojects the local image features onto their corresponding voxels and aggregates the 2D features from the images to get the 3D geometry-aware volume features. To address the inherent ambiguity stemming from sparse input data, we introduce a novel conditional diffusion model that seamlessly integrates with the 3D volume features. This integration significantly enhances our model's ability to generate high-fidelity and geometrically consistent 3D volumes. Furthermore, we propose an iterative refinement method that substantially improves the reconstruction quality. We evaluate our method through extensive experiments on real patient datasets, demonstrating its superior performance in 3D reconstruction compared to recent methods. Besides, we evaluate our method's capabilities in 3D organ shape reconstruction and volume measurement. Our promising results showcase its great potential across various medical applications, augmenting 2D X-rays by enabling assessment and visualization of 3D anatomical information, such as organ shape, size, and location, with lower cost and lower radiation exposure of X-rays.
Progressive Autoregressive Video Diffusion Models
2025-06-11 · 4 citations
articleCurrent frontier video diffusion models have demonstrated remarkable results at generating high-quality videos. However, they can only generate short video clips, normally around 10 seconds or 240 frames, due to computation limitations during training. Existing methods naively achieve autoregressive long video generation by directly placing the ending of the previous clip at the front of the attention window as conditioning, which leads to abrupt scene changes, unnatural motion, and error accumulation. In this work, we introduce a more natural formulation of autoregressive long video generation by revisiting the noise level assumption in video diffusion models. Our key idea is to 1. assign the frames with per-frame, progressively increasing noise levels rather than a single noise level and 2. denoise and shift the frames in small intervals rather than all at once. This allows for smoother attention correspondence among frames with adjacent noise levels, larger overlaps between the attention windows, and better propagation of information from the earlier to the later frames. Video diffusion models equipped with our progressive noise schedule can autoregressively generate long videos with much improved fidelity compared to the baselines and minimal quality degradation over time. We present the first results on text-conditioned 60 -second (1440 frames) long video generation at a quality close to frontier models. Code and video results are available at https://desaixie.github.io/pa-vdm/.
ArXiv.org · 2025-01-23 · 1 citations
preprintOpen accessSenior authorWe present Explainable XR, an end-to-end framework for analyzing user behavior in diverse eXtended Reality (XR) environments by leveraging Large Language Models (LLMs) for data interpretation assistance. Existing XR user analytics frameworks face challenges in handling cross-virtuality - AR, VR, MR - transitions, multi-user collaborative application scenarios, and the complexity of multimodal data. Explainable XR addresses these challenges by providing a virtuality-agnostic solution for the collection, analysis, and visualization of immersive sessions. We propose three main components in our framework: (1) A novel user data recording schema, called User Action Descriptor (UAD), that can capture the users' multimodal actions, along with their intents and the contexts; (2) a platform-agnostic XR session recorder, and (3) a visual analytics interface that offers LLM-assisted insights tailored to the analysts' perspectives, facilitating the exploration and analysis of the recorded XR session data. We demonstrate the versatility of Explainable XR by demonstrating five use-case scenarios, in both individual and collaborative XR applications across virtualities. Our technical evaluation and user studies show that Explainable XR provides a highly usable analytics solution for understanding user actions and delivering multifaceted, actionable insights into user behaviors in immersive environments.
IEEE Transactions on Visualization and Computer Graphics · 2025-05-22 · 1 citations
articleSenior authorUnderstanding visual attention is key to designing efficient human-computer interaction, especially for virtual reality (VR) and augmented reality (AR) applications. However, the relationship between 3D spatial attributes of visual stimuli and visual attention is still underexplored. Thus, we design an experiment to collect a gaze dataset in VR, and use it to quantitatively model the probability of first attention between two stimuli. First, we construct the dataset by presenting subjects with a synthetic VR scene containing varying spatial configurations of two spheres. Second, we formulate their selective attention based on a probability model that takes as input two view-specific stimuli attributes: their eccentricities in the field of view and their sizes as visual angles. Third, we train two models using our gaze dataset to predict the probability distribution of a user's preferences of visual stimuli within the scene. We evaluate our method by comparing model performance across two challenging synthetic scenes in VR. Our application case study demonstrates that VR designers can utilize our models for attention prediction in two-foreground-object scenarios, which are common when designing 3D content for storytelling or scene guidance. We make the dataset and the source code to visualize it available alongside this work.
IEEE Transactions on Visualization and Computer Graphics · 2025-03-10 · 17 citations
articleSenior authorWe present Explainable XR, an end-to-end framework for analyzing user behavior in diverse eXtended Reality (XR) environments by leveraging Large Language Models (LLMs) for data interpretation assistance. Existing XR user analytics frameworks face challenges in handling cross-virtuality - AR, VR, MR - transitions, multi-user collaborative application scenarios, and the complexity of multimodal data. Explainable XR addresses these challenges by providing a virtuality-agnostic solution for the collection, analysis, and visualization of immersive sessions. We propose three main components in our framework: (1) A novel user data recording schema, called User Action Descriptor (UAD), that can capture the users' multimodal actions, along with their intents and the contexts; (2) a platform-agnostic XR session recorder, and (3) a visual analytics interface that offers LLM-assisted insights tailored to the analysts' perspectives, facilitating the exploration and analysis of the recorded XR session data. We demonstrate the versatility of Explainable XR by demonstrating five use-case scenarios, in both individual and collaborative XR applications across virtualities. Our technical evaluation and user studies show that Explainable XR provides a highly usable analytics solution for understanding user actions and delivering multifaceted, actionable insights into user behaviors in immersive environments.
Weather Climate and Society · 2025-09-08
articleAbstract Disasters such as storm surge flooding pose an escalating threat to vulnerable coastal communities. While advances in weather models and forecasts are essential for informing protective actions, improving communication with the public for heightened storm preparedness is equally important. In this report, we provide a quantitative evaluation of lessons learned in an online workshop involving over 150 college students. The workshop employed simulated visuals of flooding and role-playing scenarios about a fictitious college campus. In addition, we used an “ethical matrix” (EM) tool to enable stakeholders to systematically represent, discuss, understand, and weigh trade-offs and perspectives pertaining to potential impacts of anticipated flooding from an impending hurricane. Building on a previous summary of the workshop (Colle et al.), this report presents quantitative and qualitative results from hypotheses about the workshop’s effects on feelings of worry, intent to take protective action, and increased awareness of others’ situations and concerns. These findings provide insights for refining hypotheses and designs for workshops with communities vulnerable to storm surge flooding. Significance Statement Traditionally, flood warnings rely on forecasts, storm path diagrams, and evacuation orders. Additional communication strategies are needed to help people grasp individual and communal impacts. This paper presents quantitative evidence that communication strategies that help people “feel” the likely impact of flooding not only on themselves but also on those around them are associated with intent to protect oneself and others, including vulnerable populations. We also provide a toolkit of supplemental material for replicating some or all of the approaches tested.
Lecture notes in computer science · 2025-09-19
book-chapterOpen accessSenior authorLRM-Zero: Training Large Reconstruction Models with Synthesized Data
2024-01-01
articleCarve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
2024-06-16 · 9 citations
articleSenior authorMulti-view diffusion models, obtained by applying Su-pervised Finetuning (SFT) to text-to-image diffusion mod-els, have driven recent breakthroughs in text-to-3D re-search. However, due to the limited size and quality of ex-isting 3D datasets, they still suffer from multi-view incon-sistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benefit from further Reinforcement Learning Finetuning (RLFT), which allows models to learn from the data generated by themselves and improve beyond their dataset limitations during SFT. To this end, we introduce Carve3D, an improved RLFT algorithm coupled with a novel Multi-view Reconstruction Consistency (MRC) metric, to enhance the consistency of multi-view diffusion models. To mea-sure the MRC metric on a set of multi-view images, we compare them with their corresponding NeRF renderings at the same camera viewpoints. The resulting model, which we denote as Carve3DM, demonstrates superior multi-view consistency and NeRF reconstruction quality than existing models. Our results suggest that pairing SFT with Carve3D's RLFT is essential for developing multi-view-consistent diffusion models, mirroring the standard Large Language Model (LLM) alignment pipeline. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d.
Frequent coauthors
- 161 shared
Klaus Mueller
- 147 shared
Miriah Meyer
- 147 shared
Aditi Majumder
University of California, Irvine
- 141 shared
Hanspeter Pfister
Harvard University
- 116 shared
Amitabh Varshney
University of Maryland, College Park
- 103 shared
Vice Chair
University of Utah
- 101 shared
Robert Moorhead
- 99 shared
Cláudio Silva
Hospital das Clínicas da Faculdade de Medicina da Universidade de São Paulo
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Aditi Majumder
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup