Vishal Patel

· Assistant ProfessorVerified

Johns Hopkins University · Radiology and Radiological Science

Active 1975–2026

h-index78

Citations25.4k

Papers675370 last 5y

Funding$1.6M1 active

Faculty page Lab page

See your match with Vishal Patel — sign in to PhdFit.Sign in

About

Vishal Patel is an associate professor of electrical and computer engineering at Johns Hopkins University and a member of the Vision and Image Understanding Lab. His research interests are focused on biomedical image analysis, biometrics, computer vision, machine learning, and signal and image processing. Patel has received numerous awards including the 2021 IEEE Signal Processing Society Pierre-Simon Laplace Early Career Technical Achievement Award, the 2021 NSF CAREER Award, the 2021 IAPR Young Biometrics Investigator Award, and has been recognized as a Fellow of the International Association for Pattern Recognition in 2024 and a 2026 IEEE Fellow. Prior to joining Johns Hopkins University, he was an A. Walter Tyson Assistant Professor at Rutgers University and a member of the research faculty at the University of Maryland Institute for Advanced Computer Studies. He completed his PhD in electrical engineering at the University of Maryland, College Park in 2010. Patel serves as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence journal, is a member of the Machine Learning for Signal Processing Committee of the IEEE Signal Processing Society, and holds the position of vice president of conferences for the IEEE Biometrics Council.

Research topics

Artificial Intelligence
Computer Science
Computer vision
Physics
Algorithm
Remote sensing
Radiology
Geography
Mathematics
Optics
Environmental science
Meteorology

Selected publications

On-Policy Distillation with Best-of-N Teacher Rollout Selection
ArXiv.org · 2026-05-10
articleOpen access
On-policy distillation (OPD), which supervises a student on its own sampled trajectories, has emerged as a data-efficient post-training method for improving reasoning while avoiding the reward dependence of reinforcement learning and the catastrophic forgetting often observed in standard supervised fine-tuning. However, standard OPD typically computes teacher supervision under noisy student-generated contexts and often relies on a single stochastic teacher rollout per prompt. As a result, the supervision signal can be high-variance: the sampled teacher trajectory can be incorrect, uninformative, or poorly matched to the student's current reasoning behavior. To address this limitation, we propose BRTS, a Best-of-N Rollout Teacher Selection framework for on-policy distillation. BRTS augments standard student-context OPD with a teacher-context supervision branch constructed from the curated teacher trajectory. Rather than distilling from the first sampled teacher rollout, BRTS samples a small pool of teacher trajectories and selects the auxiliary trajectory using a simple priority rule: correctness first, student alignment second. When multiple correct teacher trajectories are available, BRTS chooses the one most aligned with the student's current behavior; when unconditioned teacher samples fail on harder prompts, it invokes a ground-truth-conditioned recovery step to elicit a natural derivation. The selected trajectory is then used to provide reliable teacher-context supervision inside the OPD loop, augmented with an auxiliary loss on the teacher trajectory. Experiments on AIME 2024, AIME 2025, and AMC 2023 show that BRTS improves over standard OPD on challenging reasoning benchmarks, with the largest gains on harder datasets. Our code is available at https://github.com/BWGZK-keke/BRTS.
Publisher OA PDF
Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection
2026-03-06
articleOpen accessSenior author
Remote sensing change detection is often complicated by spatial misalignment between image pairs, especially when observations are separated by long temporal gaps such as seasonal or multi-year intervals. Conventional CNN- and transformer-based methods perform well on aligned data, but their reliance on perfect co-registration limits their applicability in practice. Existing approaches that integrate registration and change detection generally demand task-specific training and transfer poorly across domains. We present a lightweight, modular pipeline that strengthens robustness without retraining the underlying change detection models. The framework combines rapid per-image LoRA adaptation with a compact flow refinement module trained under supervision. To mitigate large appearance differences, we generate intermediate morphing frames via a diffusion-based semantic interpolator. Consecutive frames are aligned using a registration backbone (e.g., RoMa), and the composed flows are further corrected through a residual refinement network. The refined flow is then applied to co-register the original image pairs, enabling more reliable downstream change detection. Extensive experiments on LEVIR-CD, DSIFN-CD, and WHU-CD demonstrate that the proposed pipeline significantly improves both registration accuracy and change detection performance, especially in scenarios with substantial spatial and temporal variations.
Publisher OA PDF DOI
DiffRegCD: Integrated Registration and Change Detection with Diffusion Features
2026-03-06
articleOpen accessSenior author
Change detection (CD) is critical in computer vision and remote sensing, with applications in monitoring, disaster response, and urban analysis. Most CD models assume co-registered inputs, but real imagery often suffers from parallax, viewpoint shifts, or long temporal gaps, leading to severe misalignment. Conventional register-then-detect pipelines and recent joint frameworks (e.g., BiFA, ChangeRD) remain limited: they rely on regression-only flow, global homographies, or synthetic perturbations that fail under large displacements. We propose DiffRegCD, an integrated framework that couples dense registration and change detection. DiffRegCD reformulates correspondence as a Gaussian-smoothed classification task, delivering sub-pixel accuracy and stable training. It builds on frozen multi-scale features from a pretrained denoising diffusion model, which provide invariance to viewpoint and illumination variation. Supervision is enabled by controlled affine perturbations applied to standard CD datasets, yielding paired ground truth for both flow and change detection without pseudo-labels. Experiments on aerial (LEVIR-CD, DSIFN-CD, WHU-CD, SYSU-CD) and ground-level (VL-CMU-CD) datasets show that DiffRegCD outperforms recent baselines and remains robust under wide temporal and viewpoint variation, establishing diffusion features and classification-based correspondence as a strong foundation for integrated CD. The code is available at GitHub.
Publisher OA PDF DOI
On-Policy Distillation with Best-of-N Teacher Rollout Selection
arXiv (Cornell University) · 2026-05-10
preprintOpen access
On-policy distillation (OPD), which supervises a student on its own sampled trajectories, has emerged as a data-efficient post-training method for improving reasoning while avoiding the reward dependence of reinforcement learning and the catastrophic forgetting often observed in standard supervised fine-tuning. However, standard OPD typically computes teacher supervision under noisy student-generated contexts and often relies on a single stochastic teacher rollout per prompt. As a result, the supervision signal can be high-variance: the sampled teacher trajectory can be incorrect, uninformative, or poorly matched to the student's current reasoning behavior. To address this limitation, we propose BRTS, a Best-of-N Rollout Teacher Selection framework for on-policy distillation. BRTS augments standard student-context OPD with a teacher-context supervision branch constructed from the curated teacher trajectory. Rather than distilling from the first sampled teacher rollout, BRTS samples a small pool of teacher trajectories and selects the auxiliary trajectory using a simple priority rule: correctness first, student alignment second. When multiple correct teacher trajectories are available, BRTS chooses the one most aligned with the student's current behavior; when unconditioned teacher samples fail on harder prompts, it invokes a ground-truth-conditioned recovery step to elicit a natural derivation. The selected trajectory is then used to provide reliable teacher-context supervision inside the OPD loop, augmented with an auxiliary loss on the teacher trajectory. Experiments on AIME 2024, AIME 2025, and AMC 2023 show that BRTS improves over standard OPD on challenging reasoning benchmarks, with the largest gains on harder datasets. Our code is available at https://github.com/BWGZK-keke/BRTS.
Publisher DOI
Referring Change Detection in Remote Sensing Imagery
2026-03-06
articleSenior author
Change detection in remote sensing imagery is essential for applications such as urban planning, environmental monitoring, and disaster management. Traditional change detection methods typically identify all changes between two temporal images without distinguishing the types of transitions, which can lead to results that may not align with specific user needs. Although semantic change detection methods have attempted to address this by categorizing changes into predefined classes, these methods rely on rigid class definitions and fixed model architectures, making it difficult to mix datasets with different label sets or reuse models across tasks, as the output channels are tightly coupled with the number and type of semantic classes. To overcome these limitations, we introduce Referring Change Detection (RCD), which leverages natural language prompts to detect specific classes of changes in remote sensing images. By integrating language understanding with visual analysis, our approach allows users to specify the exact type of change they are interested in. However, training models for RCD is challenging due to the limited availability of annotated data and severe class imbalance in existing datasets. To address this, we propose a two-stage framework consisting of (I) RCDNet, a cross-modal fusion network designed for referring change detection, and (II) RCDGen, a diffusion-based synthetic data generation pipeline that produces realistic post-change images and change maps for a specified category using only pre-change image, without relying on semantic segmentation masks and thereby significantly lowering the barrier to scalable data creation. Experiments across multiple datasets show that our framework enables scalable and targeted change detection. Code will be made publicly available on Github.
Publisher DOI
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
ArXiv.org · 2025-10-14
preprintOpen access
Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such as retrieval augmented generation (RAG) methods, search agents, and search equipped MLLMs, often suffer from rigid pipelines, excessive search calls, and poorly constructed search queries, which result in inefficiencies and suboptimal outcomes. To address these limitations, we present DeepMMSearch-R1, the first multimodal LLM capable of performing on-demand, multi-turn web searches and dynamically crafting queries for both image and text search tools. Specifically, DeepMMSearch-R1 can initiate web searches based on relevant crops of the input image making the image search more effective, and can iteratively adapt text search queries based on retrieved information, thereby enabling self-reflection and self-correction. Our approach relies on a two-stage training pipeline: a cold start supervised finetuning phase followed by an online reinforcement learning optimization. For training, we introduce DeepMMSearchVQA, a novel multimodal VQA dataset created through an automated pipeline intermixed with real-world information from web search tools. This dataset contains diverse, multi-hop queries that integrate textual and visual information, teaching the model when to search, what to search for, which search tool to use and how to reason over the retrieved information. We conduct extensive experiments across a range of knowledge-intensive benchmarks to demonstrate the superiority of our approach. Finally, we analyze the results and provide insights that are valuable for advancing multimodal web-search.
Publisher OA PDF DOI
Investigating Data Replication in Medical Synthetic Image Generation with Diffusion Models
2025-08-18
articleSenior author
Recent advancements in diffusion models have greatly enhanced image generation quality, offering promise for addressing data scarcity in medical imaging, particularly for rare diseases. However, diffusion models sometimes replicate training images, raising privacy concerns, especially in healthcare. This study investigates image replication in medical diffusion models, its frequency, and potential risks to patient privacy. We analyze types of replication in synthetic data and propose methods to detect and measure replication. To safeguard privacy, we introduce mitigation strategies that can be applied before releasing synthetic data. Finally, we assess the impact of replicated and non-replicated synthetic data on medical image classification tasks for X-ray, Ultrasound, and CT images following our proposed mitigation measures.
Publisher DOI
Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data
2025-10-19
articleOpen access
Large transformer-based models have made significant progress in generalizable novel view synthesis (NVS) from sparse input views, generating novel viewpoints without the need for test-time optimization. However, these models are constrained by the limited diversity of publicly available scene datasets, making most real-world (in-the-wild) scenes out-of-distribution. To overcome this, we incorporate synthetic training data generated from diffusion models, which improves generalization across unseen domains. While synthetic data offers scalability, we identify artifacts introduced during data generation as a key bottleneck affecting reconstruction quality. To address this, we propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning. This refinement not only improves reconstruction quality over standard transformers but also enables scalable training with synthetic data. As a result, our method outperforms existing models on both in-dataset and cross-dataset evaluations, achieving state-of-the-art results across multiple benchmarks while significantly reducing computational costs. Project page: https://scaling3dnvs.github.io/
Publisher OA PDF DOI
AUDIT QUALITY AND FINANCIAL HEALTH: A STUDY OF LEADING INDIAN BANKS
International Journal of Accounting Management Economics and Social Sciences (IJAMESC) · 2025-08-31
articleOpen access1st authorCorresponding
This study investigates the relationship between audit quality and financial performance among leading Indian commercial banks from 2015 to 2024. Audit quality is assessed using a composite Audit Committee Score (ACS), developed through a binary scoring method based on 14 parameters that consider regulatory compliance and governance best practices. The research utilizes a panel of nine Nifty Bank Index banks, selected based on consistent data availability throughout the study period. The study is structured around two primary objectives. Firstly, to evaluate the effectiveness of audit committees, mean ACS values were calculated and used to rank the banks. The findings reveal that private sector banks, particularly Kotak Mahindra Bank and Federal Bank, consistently demonstrated higher audit effectiveness compared to their public sector counterparts. Secondly, the research examines the impact of audit quality on financial performance, with Return on Assets (ROA), Return on Equity (ROE), and Net Interest Margin (NIM) as dependent variables. Panel regression analysis, supported by relevant diagnostic tests and model selection criteria, indicates a statistically significant positive effect of ACS on ROA and ROE, while the effect on NIM was positive but not significant. The research underscores the crucial role of efficient audit committees in enhancing profitability and financial control in banks. The study recommends regulatory and institutional measures to further strengthen the structures of audit committees, particularly in public sector banks, in alignment with best governance practices and financial stability.
Publisher OA PDF DOI
FreeViS: Training-free Video Stylization with Inconsistent References
ArXiv.org · 2025-10-02
preprintOpen accessSenior author
Video stylization plays a key role in content creation, but it remains a challenging problem. Naïvely applying image stylization frame-by-frame hurts temporal consistency and reduces style richness. Alternatively, training a dedicated video stylization model typically requires paired video data and is computationally expensive. In this paper, we propose FreeViS, a training-free video stylization framework that generates stylized videos with rich style details and strong temporal coherence. Our method integrates multiple stylized references to a pretrained image-to-video (I2V) model, effectively mitigating the propagation errors observed in prior works, without introducing flickers and stutters. In addition, it leverages high-frequency compensation to constrain the content layout and motion, together with flow-based motion cues to preserve style textures in low-saliency regions. Through extensive evaluations, FreeViS delivers higher stylization fidelity and superior temporal consistency, outperforming recent baselines and achieving strong human preference. Our training-free pipeline offers a practical and economic solution for high-quality, temporally coherent video stylization. The code and videos can be accessed via https://xujiacong.github.io/FreeViS/
Publisher OA PDF DOI

Recent grants

SaTC: CORE: Medium: Collaborative: Presentation-attack-robust biometrics systems via computational imaging of physiology and materials
NSF · $300k · 2018–2019
CIF: Small: Collaborative Research: Sparse and Low Rank Methods for Imbalanced and Heterogeneous Data
NSF · $60k · 2018–2020
RI: Small: Collaborative Research: Active and Rapid Domain Generalization
NSF · $225k · 2019–2022
SaTC: CORE: Medium: Collaborative: Presentation-attack-robust biometrics systems via computational imaging of physiology and materials
NSF · $300k · 2018–2022
CIF: Small: Collaborative Research: Sparse and Low Rank Methods for Imbalanced and Heterogeneous Data
NSF · $249k · 2016–2019

Frequent coauthors

Fausto Milletarì
1158 shared
L. W. Davis
1156 shared
R. Nobili
Marche Polytechnic University
1156 shared
Francisco Javier Moreno-Morillo
Hospital Universitario Infanta Sofía
1156 shared
Claudio Lizio
Nvidia (United States)
1156 shared
Maria Distante
Universität Innsbruck
1156 shared
Antonio Greco
1156 shared
Fabio Galasso
Sapienza University of Rome
1156 shared

Awards & honors

2021 IEEE Signal Processing Society (SPS) Pierre-Simon Lapla…
2021 NSF CAREER Award
2021 IAPR Young Biometrics Investigator Award (YBIA)
Best Paper Award at IEEE AVSS in 2019
Best Paper Award at IEEE AVSS in 2017

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Vishal Patel

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you