
Jingjing (Jing) Huang
· Assistant Professor of Accounting and Information SystemsVerifiedVirginia Tech · Accounting
Active 1993–2026
About
Dr. Jingjing (Jing) Huang joined the Pamplin College of Business at Virginia Tech in the Fall of 2014. She holds an Accounting Ph.D. from the University of Oregon, a Master’s Degree in Accounting from Iowa State University, and a Bachelor’s Degree in Business Administration from Shanghai University of Electric Power. She is licensed as a CPA in the state of Iowa. Her research interests include tax, financial accounting, corporate finance, and R&D innovation. Her dissertation examines the role of taxes in foreign earnings management for multinational companies, and her working paper investigates how companies trade off tax incentives against nontax costs in R&D investment decisions. Dr. Huang has previously taught at the University of Oregon and has worked as a federal tax associate at KPMG in Des Moines, Iowa, as well as an accounting intern at Deloitte and Touche in London and at HNI Corporation in Iowa and Hong Kong.
Research topics
- Artificial Intelligence
- Computer Science
- Data Mining
- Computer vision
- Mathematics
- Computer graphics (images)
Selected publications
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
2026-03-06
articleOpen accessMulti-shot generation requires preserving the identity of characters and settings across frames. Cinematic scene composition goes beyond standard multi-shot generation, introducing additional challenges such as expressing complex interactions among multiple characters and visual effects to convey creative narratives—challenges existing datasets cannot fully address. We present CineVerse, a large-scale dataset of diverse movie scenes labeled with shot-level annotations tailored for filmmaking. CineVerse includes refined scene descriptions, shot-type information, and newly extracted shot, character, setting descriptions. We validate our dataset by developing a baseline framework that first generates a scene plan containing detailed information for the overall scene and each individual shot, then produces a set of coherent keyframes. Our results show significant improvements in controlling and synthesizing cinematic content through the added context provided by CineVerse.
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
2025-06-10 · 10 citations
articleVideo super-resolution (VSR) models achieve temporal consistency but often produce blurrier results than their image-based counterparts due to limited generative capacity. This prompts the question: can we adapt a generative image upsampler for VSR while preserving temporal consistency? We introduce VideoGigaGAN, a new generative VSR model that combines high-frequency detail with temporal stability, building on the large-scale GigaGAN image upsampler. Simple adaptations of GigaGAN for VSR led to flickering issues, so we propose techniques to enhance temporal consistency. We validate the effectiveness of VideoGigaGAN by comparing it with state-of-the-art VSR models on public datasets and showcasing video results with 8× upsampling.
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
arXiv (Cornell University) · 2025-12-05
preprintOpen accessVision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from training VLMs on static internet-scale visual-language data that contain no causal interactions or action-conditioned changes. Consequently, it remains challenging to leverage VLMs for fine-grained robotic manipulation tasks that require physical understanding, reasoning, and corresponding action planning. To overcome this, we present SIMPACT, a test-time, SIMulation-enabled ACTion Planning framework that equips VLMs with physical reasoning through simulation-in-the-loop world modeling, without requiring any additional training. From a single RGB-D observation, SIMPACT efficiently constructs physics simulations, enabling the VLM to propose informed actions, observe simulated rollouts, and iteratively refine its reasoning. By integrating language reasoning with physics prediction, our simulation-enabled VLM can understand contact dynamics and action outcomes in a physically grounded way. Our method demonstrates state-of-the-art performance on five challenging, real-world rigid-body and deformable manipulation tasks that require fine-grained physical reasoning, outperforming existing general-purpose robotic manipulation models. Our results demonstrate that embedding physics understanding via efficient simulation into VLM reasoning at test time offers a promising path towards generalizable embodied intelligence. Project webpage can be found at https://simpact-bot.github.io
MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects
2025-06-10
articleWe introduce MaDCoW, a method for correcting marginal distortion of arbitrary objects in wide-angle photography. People often use wide-angle photography—it is the default in smartphone cameras—but very-wide-fields-of-view produce distorted object appearance in image margins. In our system, a user annotates straight lines and regions of interest. MaDCoW solves for a separate linear perspective projection for each region and then jointly solves for a distortion-minimizing projection for the whole photograph. We show that MaDCoW can produce good results in cases where previous methods yield visible distortions.
Quality Assessment of DIBR-Synthesized Image Based on Multi-Scale Feature Fusion Deep Neural Network
2025-01-10
articleSenior authorWith the improvement of Depth Image-Based Rendering (DIBR) technology, some more complex distortions (e.g., stretching distortion) have appeared in DIBR-synthesized images, and the DIBR Image Quality Assessment (IQA) algorithm, which was previously designed for some specific distortions (e.g., black holes, blurring), have been ineffective. In this paper, an effective metric was proposed for the new DIBR-synthesized image dataset (in which the most current distortion types are contained), to simulate the multi-scale visual properties of the human eye, we use Gaussian pyramid to obtain a multiscale representation of an image, then feature fusion of the reference and the synthesized image using the pre-trained convolutional neural network Densenet201, finally use a cosine similarity to measure the difference between the feature maps as the deviation between the reference and synthesized image. Finally, the experimental result of this algorithm on the latest DIBR-synthesized image dataset show that the model proposed in this paper is highly competitive with other IQA methods.
UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video
2025-03-25 · 1 citations
articleWe present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video. It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras, differing from NeRF's dense view settings. In this context, standard methods often yield subpar geometry and material estimates, such as inaccurate roof representations and numerous ‘floaters’. UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts. Its techniques allow for precise shadow volume estimation in the original scene. The model's outputs support controllable editing, enabling photorealistic free-viewpoint renderings of night simulations, relit scenes, and inserted objects, marking a significant improvement over existing state-of-the-art methods. Our code and data will be made publicly available upon acceptance.
IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images
2025-06-10 · 2 citations
articleInverse rendering seeks to recover 3D geometry, surface material, and lighting from captured images, enabling advanced applications such as novel-view synthesis, relighting, and virtual object insertion. However, most existing techniques rely on high dynamic range (HDR) images as input, limiting accessibility for general users. In response, we introduce IRIS, an inverse rendering framework that recovers the physically based material, spatially-varying HDR lighting, and camera response functions from multi-view, low-dynamic-range (LDR) images. By eliminating the dependence on HDR input, we make inverse rendering technology more accessible. We evaluate our approach on real-world and synthetic scenes and compare it with state-of-the-art methods. Our results show that IRIS effectively recovers HDR lighting, accurate material, and plausible camera response functions, supporting photorealistic relighting and object insertion.
2025-01-16
peer-reviewA bird target detection model designed for substation scenarios
Scientific Reports · 2025-11-06
articleOpen accessSubstations are critical to the power grid, but they are often disturbed by bird activity, which can lead to power failures and outages. Conventional bird-detection methods are costly and lack long-term effectiveness. To address these issues, this paper proposes a bird target detection method, YOLO-birds, designed explicitly for substation scenarios. It utilizes the Faster-BiFPN module to optimize feature extraction and fusion by combining low-level and high-level features, thereby enhancing detection accuracy. In addition, the SPPBiF attention mechanism is introduced to address the challenge of detecting targets at different scales and small objects, such as birds. To further improve robustness, the Focal-EIoU loss function is also utilized to mitigate the effect of low-quality samples. To support this research and improve the detection performance in real-world scenarios, a self-constructed dataset of bird images focusing on security threats at substations was created. Experimental results show that YOLO-birds achieves a mAP50 of 90.2%, which validates the effectiveness of the proposed method compared to other methods. The method can efficiently detect birds inhabiting substations, which can help to differentiate the prevention of bird-caused accidents in power grids.
TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos
ArXiv.org · 2025-11-26
preprintOpen accessLearning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are abundant, differences in embodiment, camera, and environment hinder their direct use. We address the small-data problem by introducing a unifying, symbolic representation - a compact 3D "trace-space" of scene-level trajectories - that enables learning from cross-embodiment, cross-environment, and cross-task videos. We present TraceGen, a world model that predicts future motion in trace-space rather than pixel space, abstracting away appearance while retaining the geometric structure needed for manipulation. To train TraceGen at scale, we develop TraceForge, a data pipeline that transforms heterogeneous human and robot videos into consistent 3D traces, yielding a corpus of 123K videos and 1.8M observation-trace-language triplets. Pretraining on this corpus produces a transferable 3D motion prior that adapts efficiently: with just five target robot videos, TraceGen attains 80% success across four tasks while offering 50-600x faster inference than state-of-the-art video-based world models. In the more challenging case where only five uncalibrated human demonstration videos captured on a handheld phone are available, it still reaches 67.5% success on a real robot, highlighting TraceGen's ability to adapt across embodiments without relying on object detectors or heavy pixel-space generation.
Recent grants
CRII: RI: Representation Learning and Adaptation using Unlabeled Videos
NSF · $173k · 2018–2021
Frequent coauthors
- 66 shared
Ming–Hsuan Yang
- 45 shared
Johannes Kopf
Alpha Omega Alpha Medical Honor Society
- 27 shared
Chang-Il Kim
- 24 shared
Changhee Jung
Purdue University System
- 24 shared
Ryan K. Williams
- 24 shared
Ashrarul H. Sifat
- 23 shared
Haibo Zeng
Virginia Tech
- 22 shared
Xuanliang Deng
Labs
Accounting and Information SystemsPI
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jingjing (Jing) Huang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup