James H Tompkin
· Associate Professor of Computer ScienceVerifiedBrown University · Computer Science
Active 2009–2026
About
James Tompkin is an Associate Professor specializing in Visual Computing, with research interests spanning computer vision, computer graphics, and human-computer interaction. His lab focuses on developing advanced techniques for image and video creation, editing, analysis, and interaction. This work involves image and scene reconstruction methods, particularly from multi-camera systems and complex dynamic scenes, with applications across 2D, multi-view, and VR/AR display technologies. His research aims to enhance the capabilities of visual computing systems by improving how images and scenes are captured, synthesized, and manipulated for various interactive and immersive environments.
Research topics
- Artificial Intelligence
- Computer Science
- Computer graphics (images)
Selected publications
niiv: Interactive Self-supervised Neural Implicit Isotropic Volume Reconstruction
Lecture notes in computer science · 2026-01-01
book-chapterSenior authorTreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement
ArXiv.org · 2026-01-19
articleOpen accessSenior authorAerial remote sensing enables efficient large-area surveying, but accurate direct object-level measurement remains difficult in complex natural scenes. Recent advancements in 3D vision, particularly learned radiance-field representations such as NeRF and 3D Gaussian Splatting, have begun to raise the ceiling on reconstruction fidelity and densifiable geometry from posed imagery. Nevertheless, direct aerial measurement of important natural attributes such as tree diameter at breast height (DBH) remains challenging. Trunks in aerial forest scans are distant and sparsely observed in image views: at typical operating altitudes, stems may span only a few pixels. With these constraints, conventional reconstruction methods leave breast-height trunk geometry weakly constrained. We present TreeDGS, an aerial image reconstruction method that leverages 3D Gaussian Splatting as a continuous, densifiable scene representation for trunk measurement. After SfM--MVS initialization and Gaussian optimization, we extract a dense point set from the Gaussian field using RaDe-GS's depth-aware cumulative-opacity integration and associate each sample with a multi-view opacity reliability score. Then, we estimate DBH from trunk-isolated points using opacity-weighted solid-circle fitting. Evaluated on 10 plots with field-measured DBH, TreeDGS reaches 4.79,cm RMSE (about 2.6 pixels at this GSD) and outperforms a state-of-the-art LiDAR baseline (7.91,cm RMSE). This shows that TreeDGS can enable accurate, low-cost aerial DBH measurement
TreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement
arXiv (Cornell University) · 2026-01-19
preprintOpen accessSenior authorAerial remote sensing efficiently surveys large areas, but accurate direct object-level measurement remains difficult in complex natural scenes. Advancements in 3D computer vision, particularly radiance field representations such as NeRF and 3D Gaussian splatting, can improve reconstruction fidelity from posed imagery. Nevertheless, direct aerial measurement of important attributes like tree diameter at breast height (DBH) remains challenging. Trunks in aerial forest scans are distant and sparsely observed in image views; at typical operating altitudes, stems may span only a few pixels. With these constraints, conventional reconstruction methods have inaccurate breast-height trunk geometry. TreeDGS is an aerial image reconstruction method that uses 3D Gaussian splatting as a continuous scene representation for trunk measurement. After SfM--MVS initialization and Gaussian optimization, we extract a dense point set from the Gaussian field using RaDe-GS's depth-aware cumulative-opacity integration and associate each sample with a multi-view opacity reliability score. Then, we isolate trunk points and estimate DBH using opacity-weighted solid-circle fitting. Evaluated on 10 plots with field-measured DBH, TreeDGS reaches 4.79 cm RMSE (about 2.6 pixels at this GSD) and outperforms a LiDAR baseline (7.66 cm RMSE). This shows that TreeDGS can enable accurate, low-cost aerial DBH measurement .
TreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement
Remote Sensing · 2026-03-11 · 1 citations
articleOpen accessSenior authorAerial remote sensing efficiently surveys large areas, but accurate direct object-level measurement remains difficult in complex natural scenes. Advancements in 3D computer vision, particularly radiance field representations such as NeRF and 3D Gaussian splatting, can improve reconstruction fidelity from posed imagery. Nevertheless, direct aerial measurement of important attributes like tree diameter at breast height (DBH) remains challenging. Trunks in aerial forest scans are distant and sparsely observed in image views; at typical operating altitudes, stems may span only a few pixels. With these constraints, conventional reconstruction methods have inaccurate breast-height trunk geometry. TreeDGS is an aerial image reconstruction method that uses 3D Gaussian splatting as a continuous scene representation for trunk measurement. After SfM–MVS initialization and Gaussian optimization, we extract a dense point set from the Gaussian field using RaDe-GS’s depth-aware cumulative-opacity integration and associate each sample with a multi-view opacity reliability score. Then, we isolate trunk points and estimate DBH using opacity-weighted solid-circle fitting. Evaluated on 10 plots with field-measured DBH, TreeDGS reaches 4.79 cm RMSE (about 2.6 pixels at this GSD) and outperforms a LiDAR baseline (7.66 cm RMSE). This shows that TreeDGS can enable accurate, low-cost aerial DBH measurement.
Efficient Object Reconstruction with Differentiable Area Light Shading
2025-12-08 · 1 citations
articleIn 3D object reconstruction from photographs, estimating material properties is challenging. We propose an inverse rendering method that uses active area lighting: as this provides a wider range of lighting angles per photo than point lighting, material reconstruction can be more accurate for the same number of photos. We compare area light shading with point lighting. With either mesh or 3D Gaussian splatting pipelines, area lighting can improve BRDF reconstruction and leads to +3 dB relighting PSNR over point lights, or need only \(\nicefrac {1}{5}\) of the input photos for the same quality. We also compare area light shading with Monte Carlo ray tracing and with differential linearly transformed cosines (LTC) plus shadow visibility weighting. LTC can be faster, improving optimization times by 25%. In SOTA method-level comparisons, our approach improves material reconstruction, particularly for material roughness, leading to superior relighting quality.
GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis
2025-02-26 · 25 citations
articleWe propose a method that achieves state-of-the-art rendering quality and efficiency on monocular dynamic scene reconstruction using deformable 3D Gaussians. Implicit deformable representations commonly model motion with a canonical space and time-dependent backward-warping deformation field. Our method, GauFRe, uses a forward-warping deformation to explicitly model non-rigid transformations of scene geometry. Specifically, we propose a template set of 3D Gaussians residing in a canonical space, and a time-dependent forward-warping deformation field to model dynamic objects. Additionally, we tailor a 3D Gaussian-specific static component supported by an inductive bias-aware initialization approach which allows the deformation field to focus on moving scene regions, improving the rendering of complex real-world motion. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Experiments show our method achieves competitive results and higher efficiency than both previous state-of-the-art NeRF and Gaussian-based methods. For real-world scenes, GauFRe can train in ≈20 mins and offer 96 FPS real-time rendering on an RTX 3090 GPU.
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
ArXiv.org · 2025-05-30
preprintOpen accessReinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for post-training large language models (LLMs), achieving state-of-the-art performance on tasks with structured, verifiable answers. Applying RLVR to Multimodal LLMs (MLLMs) presents significant opportunities but is complicated by the broader, heterogeneous nature of vision-language tasks that demand nuanced visual, logical, and spatial capabilities. As such, training MLLMs using RLVR on multiple datasets could be beneficial but creates challenges with conflicting objectives from interaction among diverse datasets, highlighting the need for optimal dataset mixture strategies to improve generalization and reasoning. We introduce a systematic post-training framework for Multimodal LLM RLVR, featuring a rigorous data mixture problem formulation and benchmark implementation. Specifically, (1) We developed a multimodal RLVR framework for multi-dataset post-training by curating a dataset that contains different verifiable vision-language problems and enabling multi-domain online RL learning with different verifiable rewards; (2) We proposed a data mixture strategy that learns to predict the RL fine-tuning outcome from the data mixture distribution, and consequently optimizes the best mixture. Comprehensive experiments showcase that multi-domain RLVR training, when combined with mixture prediction strategies, can significantly boost MLLM general reasoning capacities. Our best mixture improves the post-trained model's accuracy on out-of-distribution benchmarks by an average of 5.24% compared to the same model post-trained with uniform data mixture, and by a total of 20.74% compared to the pre-finetuning baseline.
Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields
2025-06-10 · 1 citations
articleSenior authorWe present a method to reconstruct dynamic scenes from monocular continuous-wave time-of-flight (C-ToF) cameras using raw sensor samples that achieves similar or better accuracy than neural volumetric approaches and is 100× faster. Quickly achieving high-fidelity dynamic 3D reconstruction from a single viewpoint is a significant challenge in computer vision. In C-ToF radiance field reconstruction, the property of interest—depth—is not directly measured, causing an additional challenge. This problem has a large and underappreciated impact upon the optimization when using a fast primitive-based scene representation like 3D Gaussian splatting, which is commonly used with multi-view data to produce satisfactory results and is brittle in its optimization otherwise. We incorporate two heuristics into the optimization to improve the accuracy of scene geometry represented by Gaussians. Experimental results show that our approach produces accurate reconstructions under constrained C-ToF sensing conditions, including for fast motions like swinging baseball bats. https://visual.cs.brown.edu/gftorf
InfoVids: Reimagining the Viewer Experience with Alternative Visualization-Presenter Relationships
ArXiv.org · 2025-05-06
preprintOpen accessTraditional data presentations typically separate the presenter and visualization into two separate spaces--the 3D world and a 2D screen--enforcing visualization-centric stories. To create a more human-centric viewing experience, we establish a more equitable relationship between the visualization and the presenter through our InfoVids. These infographics-inspired informational videos are crafted to redefine relationships between the presenter and visualizations. As we design InfoVids, we explore how the use of layout, form, and interactions affects the viewer experience. We compare InfoVids against their baseline 2D `slides' equivalents across 9 metrics with 30 participants and provide practical, long-term insights from an autobiographical perspective. Our mixed methods analyses reveal that this paradigm reduced viewer attention splitting, shifted the focus from the visualization to the presenter, and led to more interactive, natural, and engaging full-body data performances for viewers. Ultimately, InfoVids helped viewers re-imagine traditional dynamics between the presenter and visualizations.
Zero-Shot Monocular Scene Flow Estimation in the Wild
ArXiv.org · 2025-01-17
preprintOpen accessLarge models have shown generalization across datasets for many low-level vision tasks, like depth estimation, but no such general models exist for scene flow. Even though scene flow has wide potential use, it is not used in practice because current predictive models do not generalize well. We identify three key challenges and propose solutions for each. First, we create a method that jointly estimates geometry and motion for accurate prediction. Second, we alleviate scene flow data scarcity with a data recipe that affords us 1M annotated training samples across diverse synthetic scenes. Third, we evaluate different parameterizations for scene flow prediction and adopt a natural and effective parameterization. Our resulting model outperforms existing methods as well as baselines built on large-scale models in terms of 3D end-point error, and shows zero-shot generalization to the casually captured videos from DAVIS and the robotic manipulation scenes from RoboTAP. Overall, our approach makes scene flow prediction more practical in-the-wild.
Frequent coauthors
- 42 shared
Kwang In Kim
Pohang University of Science and Technology
- 40 shared
Hanspeter Pfister
Harvard University
- 39 shared
Christian Theobalt
- 30 shared
Jan Kautz
Nvidia (United States)
- 23 shared
Christian Richardt
- 22 shared
Min H. Kim
- 22 shared
Stefanie Tellex
John Brown University
- 18 shared
Aaron Gokaslan
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with James H Tompkin
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup