Jingjing (Jing) Huang

· Assistant Professor of Accounting and Information SystemsVerified

Virginia Tech · Accounting

Active 1993–2026

h-index59

Citations23.8k

Papers276174 last 5y

Funding$173k

Faculty page

See your match with Jingjing (Jing) Huang — sign in to PhdFit.Sign in

About

Dr. Jingjing (Jing) Huang joined the Pamplin College of Business at Virginia Tech in the Fall of 2014. She holds an Accounting Ph.D. from the University of Oregon, a Master’s Degree in Accounting from Iowa State University, and a Bachelor’s Degree in Business Administration from Shanghai University of Electric Power. She is licensed as a CPA in the state of Iowa. Her research interests include tax, financial accounting, corporate finance, and R&D innovation. Her dissertation examines the role of taxes in foreign earnings management for multinational companies, and her working paper investigates how companies trade off tax incentives against nontax costs in R&D investment decisions. Dr. Huang has previously taught at the University of Oregon and has worked as a federal tax associate at KPMG in Des Moines, Iowa, as well as an accounting intern at Deloitte and Touche in London and at HNI Corporation in Iowa and Hong Kong.

Research topics

Artificial Intelligence
Computer Science
Data Mining
Computer vision
Mathematics
Computer graphics (images)

Selected publications

CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
2026-03-06
articleOpen access
Multi-shot generation requires preserving the identity of characters and settings across frames. Cinematic scene composition goes beyond standard multi-shot generation, introducing additional challenges such as expressing complex interactions among multiple characters and visual effects to convey creative narratives—challenges existing datasets cannot fully address. We present CineVerse, a large-scale dataset of diverse movie scenes labeled with shot-level annotations tailored for filmmaking. CineVerse includes refined scene descriptions, shot-type information, and newly extracted shot, character, setting descriptions. We validate our dataset by developing a baseline framework that first generates a scene plan containing detailed information for the overall scene and each individual shot, then produces a set of coherent keyframes. Our results show significant improvements in controlling and synthesizing cinematic content through the added context provided by CineVerse.
Publisher OA PDF DOI
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
2025-06-10 · 10 citations
article
Video super-resolution (VSR) models achieve temporal consistency but often produce blurrier results than their image-based counterparts due to limited generative capacity. This prompts the question: can we adapt a generative image upsampler for VSR while preserving temporal consistency? We introduce VideoGigaGAN, a new generative VSR model that combines high-frequency detail with temporal stability, building on the large-scale GigaGAN image upsampler. Simple adaptations of GigaGAN for VSR led to flickering issues, so we propose techniques to enhance temporal consistency. We validate the effectiveness of VideoGigaGAN by comparing it with state-of-the-art VSR models on public datasets and showcasing video results with 8× upsampling.
Publisher DOI
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
arXiv (Cornell University) · 2025-12-05
preprintOpen access
Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from training VLMs on static internet-scale visual-language data that contain no causal interactions or action-conditioned changes. Consequently, it remains challenging to leverage VLMs for fine-grained robotic manipulation tasks that require physical understanding, reasoning, and corresponding action planning. To overcome this, we present SIMPACT, a test-time, SIMulation-enabled ACTion Planning framework that equips VLMs with physical reasoning through simulation-in-the-loop world modeling, without requiring any additional training. From a single RGB-D observation, SIMPACT efficiently constructs physics simulations, enabling the VLM to propose informed actions, observe simulated rollouts, and iteratively refine its reasoning. By integrating language reasoning with physics prediction, our simulation-enabled VLM can understand contact dynamics and action outcomes in a physically grounded way. Our method demonstrates state-of-the-art performance on five challenging, real-world rigid-body and deformable manipulation tasks that require fine-grained physical reasoning, outperforming existing general-purpose robotic manipulation models. Our results demonstrate that embedding physics understanding via efficient simulation into VLM reasoning at test time offers a promising path towards generalizable embodied intelligence. Project webpage can be found at https://simpact-bot.github.io
Publisher OA PDF DOI
MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects
2025-06-10
article
We introduce MaDCoW, a method for correcting marginal distortion of arbitrary objects in wide-angle photography. People often use wide-angle photography—it is the default in smartphone cameras—but very-wide-fields-of-view produce distorted object appearance in image margins. In our system, a user annotates straight lines and regions of interest. MaDCoW solves for a separate linear perspective projection for each region and then jointly solves for a distortion-minimizing projection for the whole photograph. We show that MaDCoW can produce good results in cases where previous methods yield visible distortions.
Publisher DOI
Quality Assessment of DIBR-Synthesized Image Based on Multi-Scale Feature Fusion Deep Neural Network
2025-01-10
articleSenior author
With the improvement of Depth Image-Based Rendering (DIBR) technology, some more complex distortions (e.g., stretching distortion) have appeared in DIBR-synthesized images, and the DIBR Image Quality Assessment (IQA) algorithm, which was previously designed for some specific distortions (e.g., black holes, blurring), have been ineffective. In this paper, an effective metric was proposed for the new DIBR-synthesized image dataset (in which the most current distortion types are contained), to simulate the multi-scale visual properties of the human eye, we use Gaussian pyramid to obtain a multiscale representation of an image, then feature fusion of the reference and the synthesized image using the pre-trained convolutional neural network Densenet201, finally use a cosine similarity to measure the difference between the feature maps as the deviation between the reference and synthesized image. Finally, the experimental result of this algorithm on the latest DIBR-synthesized image dataset show that the model proposed in this paper is highly competitive with other IQA methods.
Publisher DOI
UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video
2025-03-25 · 1 citations
article
We present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video. It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras, differing from NeRF's dense view settings. In this context, standard methods often yield subpar geometry and material estimates, such as inaccurate roof representations and numerous ‘floaters’. UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts. Its techniques allow for precise shadow volume estimation in the original scene. The model's outputs support controllable editing, enabling photorealistic free-viewpoint renderings of night simulations, relit scenes, and inserted objects, marking a significant improvement over existing state-of-the-art methods. Our code and data will be made publicly available upon acceptance.
Publisher DOI
IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images
2025-06-10 · 2 citations
article
Inverse rendering seeks to recover 3D geometry, surface material, and lighting from captured images, enabling advanced applications such as novel-view synthesis, relighting, and virtual object insertion. However, most existing techniques rely on high dynamic range (HDR) images as input, limiting accessibility for general users. In response, we introduce IRIS, an inverse rendering framework that recovers the physically based material, spatially-varying HDR lighting, and camera response functions from multi-view, low-dynamic-range (LDR) images. By eliminating the dependence on HDR input, we make inverse rendering technology more accessible. We evaluate our approach on real-world and synthetic scenes and compare it with state-of-the-art methods. Our results show that IRIS effectively recovers HDR lighting, accurate material, and plausible camera response functions, supporting photorealistic relighting and object insertion.
Publisher DOI
Author response for "Network Traffic Prediction Model Based on Convolutional Neural Networks‐Long Short‐Term Memory and <scp>iTransformer</scp>"
2025-01-16
peer-review
Publisher DOI
A bird target detection model designed for substation scenarios
Scientific Reports · 2025-11-06
articleOpen access
Substations are critical to the power grid, but they are often disturbed by bird activity, which can lead to power failures and outages. Conventional bird-detection methods are costly and lack long-term effectiveness. To address these issues, this paper proposes a bird target detection method, YOLO-birds, designed explicitly for substation scenarios. It utilizes the Faster-BiFPN module to optimize feature extraction and fusion by combining low-level and high-level features, thereby enhancing detection accuracy. In addition, the SPPBiF attention mechanism is introduced to address the challenge of detecting targets at different scales and small objects, such as birds. To further improve robustness, the Focal-EIoU loss function is also utilized to mitigate the effect of low-quality samples. To support this research and improve the detection performance in real-world scenarios, a self-constructed dataset of bird images focusing on security threats at substations was created. Experimental results show that YOLO-birds achieves a mAP50 of 90.2%, which validates the effectiveness of the proposed method compared to other methods. The method can efficiently detect birds inhabiting substations, which can help to differentiate the prevention of bird-caused accidents in power grids.
Publisher OA PDF DOI
TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos
ArXiv.org · 2025-11-26
preprintOpen access
Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - humans and different robots - are abundant, differences in embodiment, camera, and environment hinder their direct use. We address the small-data problem by introducing a unifying, symbolic representation - a compact 3D "trace-space" of scene-level trajectories - that enables learning from cross-embodiment, cross-environment, and cross-task videos. We present TraceGen, a world model that predicts future motion in trace-space rather than pixel space, abstracting away appearance while retaining the geometric structure needed for manipulation. To train TraceGen at scale, we develop TraceForge, a data pipeline that transforms heterogeneous human and robot videos into consistent 3D traces, yielding a corpus of 123K videos and 1.8M observation-trace-language triplets. Pretraining on this corpus produces a transferable 3D motion prior that adapts efficiently: with just five target robot videos, TraceGen attains 80% success across four tasks while offering 50-600x faster inference than state-of-the-art video-based world models. In the more challenging case where only five uncalibrated human demonstration videos captured on a handheld phone are available, it still reaches 67.5% success on a real robot, highlighting TraceGen's ability to adapt across embodiments without relying on object detectors or heavy pixel-space generation.
Publisher OA PDF DOI

Recent grants

CRII: RI: Representation Learning and Adaptation using Unlabeled Videos
NSF · $173k · 2018–2021

Frequent coauthors

Ming–Hsuan Yang
66 shared
Johannes Kopf
Alpha Omega Alpha Medical Honor Society
45 shared
Chang-Il Kim
27 shared
Changhee Jung
Purdue University System
24 shared
Ryan K. Williams
24 shared
Ashrarul H. Sifat
24 shared
Haibo Zeng
Virginia Tech
23 shared
Xuanliang Deng
22 shared

Labs

Accounting and Information SystemsPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Jingjing (Jing) Huang

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you