
Vishal Patel
· Assistant ProfessorVerifiedJohns Hopkins University · Radiology and Radiological Science
Active 1975–2026
About
Vishal Patel is an associate professor of electrical and computer engineering at Johns Hopkins University and a member of the Vision and Image Understanding Lab. His research interests are focused on biomedical image analysis, biometrics, computer vision, machine learning, and signal and image processing. Patel has received numerous awards including the 2021 IEEE Signal Processing Society Pierre-Simon Laplace Early Career Technical Achievement Award, the 2021 NSF CAREER Award, the 2021 IAPR Young Biometrics Investigator Award, and has been recognized as a Fellow of the International Association for Pattern Recognition in 2024 and a 2026 IEEE Fellow. Prior to joining Johns Hopkins University, he was an A. Walter Tyson Assistant Professor at Rutgers University and a member of the research faculty at the University of Maryland Institute for Advanced Computer Studies. He completed his PhD in electrical engineering at the University of Maryland, College Park in 2010. Patel serves as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence journal, is a member of the Machine Learning for Signal Processing Committee of the IEEE Signal Processing Society, and holds the position of vice president of conferences for the IEEE Biometrics Council.
Research topics
- Artificial Intelligence
- Computer Science
- Computer vision
- Physics
- Algorithm
- Remote sensing
- Radiology
- Geography
- Mathematics
- Optics
- Environmental science
- Meteorology
Selected publications
On-Policy Distillation with Best-of-N Teacher Rollout Selection
ArXiv.org · 2026-05-10
articleOpen accessOn-policy distillation (OPD), which supervises a student on its own sampled trajectories, has emerged as a data-efficient post-training method for improving reasoning while avoiding the reward dependence of reinforcement learning and the catastrophic forgetting often observed in standard supervised fine-tuning. However, standard OPD typically computes teacher supervision under noisy student-generated contexts and often relies on a single stochastic teacher rollout per prompt. As a result, the supervision signal can be high-variance: the sampled teacher trajectory can be incorrect, uninformative, or poorly matched to the student's current reasoning behavior. To address this limitation, we propose BRTS, a Best-of-N Rollout Teacher Selection framework for on-policy distillation. BRTS augments standard student-context OPD with a teacher-context supervision branch constructed from the curated teacher trajectory. Rather than distilling from the first sampled teacher rollout, BRTS samples a small pool of teacher trajectories and selects the auxiliary trajectory using a simple priority rule: correctness first, student alignment second. When multiple correct teacher trajectories are available, BRTS chooses the one most aligned with the student's current behavior; when unconditioned teacher samples fail on harder prompts, it invokes a ground-truth-conditioned recovery step to elicit a natural derivation. The selected trajectory is then used to provide reliable teacher-context supervision inside the OPD loop, augmented with an auxiliary loss on the teacher trajectory. Experiments on AIME 2024, AIME 2025, and AMC 2023 show that BRTS improves over standard OPD on challenging reasoning benchmarks, with the largest gains on harder datasets. Our code is available at https://github.com/BWGZK-keke/BRTS.
2026-03-06
articleOpen accessSenior authorRemote sensing change detection is often complicated by spatial misalignment between image pairs, especially when observations are separated by long temporal gaps such as seasonal or multi-year intervals. Conventional CNN- and transformer-based methods perform well on aligned data, but their reliance on perfect co-registration limits their applicability in practice. Existing approaches that integrate registration and change detection generally demand task-specific training and transfer poorly across domains. We present a lightweight, modular pipeline that strengthens robustness without retraining the underlying change detection models. The framework combines rapid per-image LoRA adaptation with a compact flow refinement module trained under supervision. To mitigate large appearance differences, we generate intermediate morphing frames via a diffusion-based semantic interpolator. Consecutive frames are aligned using a registration backbone (e.g., RoMa), and the composed flows are further corrected through a residual refinement network. The refined flow is then applied to co-register the original image pairs, enabling more reliable downstream change detection. Extensive experiments on LEVIR-CD, DSIFN-CD, and WHU-CD demonstrate that the proposed pipeline significantly improves both registration accuracy and change detection performance, especially in scenarios with substantial spatial and temporal variations.
DiffRegCD: Integrated Registration and Change Detection with Diffusion Features
2026-03-06
articleOpen accessSenior authorChange detection (CD) is critical in computer vision and remote sensing, with applications in monitoring, disaster response, and urban analysis. Most CD models assume co-registered inputs, but real imagery often suffers from parallax, viewpoint shifts, or long temporal gaps, leading to severe misalignment. Conventional register-then-detect pipelines and recent joint frameworks (e.g., BiFA, ChangeRD) remain limited: they rely on regression-only flow, global homographies, or synthetic perturbations that fail under large displacements. We propose DiffRegCD, an integrated framework that couples dense registration and change detection. DiffRegCD reformulates correspondence as a Gaussian-smoothed classification task, delivering sub-pixel accuracy and stable training. It builds on frozen multi-scale features from a pretrained denoising diffusion model, which provide invariance to viewpoint and illumination variation. Supervision is enabled by controlled affine perturbations applied to standard CD datasets, yielding paired ground truth for both flow and change detection without pseudo-labels. Experiments on aerial (LEVIR-CD, DSIFN-CD, WHU-CD, SYSU-CD) and ground-level (VL-CMU-CD) datasets show that DiffRegCD outperforms recent baselines and remains robust under wide temporal and viewpoint variation, establishing diffusion features and classification-based correspondence as a strong foundation for integrated CD. The code is available at GitHub.
On-Policy Distillation with Best-of-N Teacher Rollout Selection
arXiv (Cornell University) · 2026-05-10
preprintOpen accessOn-policy distillation (OPD), which supervises a student on its own sampled trajectories, has emerged as a data-efficient post-training method for improving reasoning while avoiding the reward dependence of reinforcement learning and the catastrophic forgetting often observed in standard supervised fine-tuning. However, standard OPD typically computes teacher supervision under noisy student-generated contexts and often relies on a single stochastic teacher rollout per prompt. As a result, the supervision signal can be high-variance: the sampled teacher trajectory can be incorrect, uninformative, or poorly matched to the student's current reasoning behavior. To address this limitation, we propose BRTS, a Best-of-N Rollout Teacher Selection framework for on-policy distillation. BRTS augments standard student-context OPD with a teacher-context supervision branch constructed from the curated teacher trajectory. Rather than distilling from the first sampled teacher rollout, BRTS samples a small pool of teacher trajectories and selects the auxiliary trajectory using a simple priority rule: correctness first, student alignment second. When multiple correct teacher trajectories are available, BRTS chooses the one most aligned with the student's current behavior; when unconditioned teacher samples fail on harder prompts, it invokes a ground-truth-conditioned recovery step to elicit a natural derivation. The selected trajectory is then used to provide reliable teacher-context supervision inside the OPD loop, augmented with an auxiliary loss on the teacher trajectory. Experiments on AIME 2024, AIME 2025, and AMC 2023 show that BRTS improves over standard OPD on challenging reasoning benchmarks, with the largest gains on harder datasets. Our code is available at https://github.com/BWGZK-keke/BRTS.
Referring Change Detection in Remote Sensing Imagery
2026-03-06
articleSenior authorChange detection in remote sensing imagery is essential for applications such as urban planning, environmental monitoring, and disaster management. Traditional change detection methods typically identify all changes between two temporal images without distinguishing the types of transitions, which can lead to results that may not align with specific user needs. Although semantic change detection methods have attempted to address this by categorizing changes into predefined classes, these methods rely on rigid class definitions and fixed model architectures, making it difficult to mix datasets with different label sets or reuse models across tasks, as the output channels are tightly coupled with the number and type of semantic classes. To overcome these limitations, we introduce Referring Change Detection (RCD), which leverages natural language prompts to detect specific classes of changes in remote sensing images. By integrating language understanding with visual analysis, our approach allows users to specify the exact type of change they are interested in. However, training models for RCD is challenging due to the limited availability of annotated data and severe class imbalance in existing datasets. To address this, we propose a two-stage framework consisting of (I) RCDNet, a cross-modal fusion network designed for referring change detection, and (II) RCDGen, a diffusion-based synthetic data generation pipeline that produces realistic post-change images and change maps for a specified category using only pre-change image, without relying on semantic segmentation masks and thereby significantly lowering the barrier to scalable data creation. Experiments across multiple datasets show that our framework enables scalable and targeted change detection. Code will be made publicly available on Github.
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
ArXiv.org · 2025-10-14
preprintOpen accessMultimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in order to address information-seeking and knowledge-intensive user queries. Existing approaches, such as retrieval augmented generation (RAG) methods, search agents, and search equipped MLLMs, often suffer from rigid pipelines, excessive search calls, and poorly constructed search queries, which result in inefficiencies and suboptimal outcomes. To address these limitations, we present DeepMMSearch-R1, the first multimodal LLM capable of performing on-demand, multi-turn web searches and dynamically crafting queries for both image and text search tools. Specifically, DeepMMSearch-R1 can initiate web searches based on relevant crops of the input image making the image search more effective, and can iteratively adapt text search queries based on retrieved information, thereby enabling self-reflection and self-correction. Our approach relies on a two-stage training pipeline: a cold start supervised finetuning phase followed by an online reinforcement learning optimization. For training, we introduce DeepMMSearchVQA, a novel multimodal VQA dataset created through an automated pipeline intermixed with real-world information from web search tools. This dataset contains diverse, multi-hop queries that integrate textual and visual information, teaching the model when to search, what to search for, which search tool to use and how to reason over the retrieved information. We conduct extensive experiments across a range of knowledge-intensive benchmarks to demonstrate the superiority of our approach. Finally, we analyze the results and provide insights that are valuable for advancing multimodal web-search.
Investigating Data Replication in Medical Synthetic Image Generation with Diffusion Models
2025-08-18
articleSenior authorRecent advancements in diffusion models have greatly enhanced image generation quality, offering promise for addressing data scarcity in medical imaging, particularly for rare diseases. However, diffusion models sometimes replicate training images, raising privacy concerns, especially in healthcare. This study investigates image replication in medical diffusion models, its frequency, and potential risks to patient privacy. We analyze types of replication in synthetic data and propose methods to detect and measure replication. To safeguard privacy, we introduce mitigation strategies that can be applied before releasing synthetic data. Finally, we assess the impact of replicated and non-replicated synthetic data on medical image classification tasks for X-ray, Ultrasound, and CT images following our proposed mitigation measures.
Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data
2025-10-19
articleOpen accessLarge transformer-based models have made significant progress in generalizable novel view synthesis (NVS) from sparse input views, generating novel viewpoints without the need for test-time optimization. However, these models are constrained by the limited diversity of publicly available scene datasets, making most real-world (in-the-wild) scenes out-of-distribution. To overcome this, we incorporate synthetic training data generated from diffusion models, which improves generalization across unseen domains. While synthetic data offers scalability, we identify artifacts introduced during data generation as a key bottleneck affecting reconstruction quality. To address this, we propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning. This refinement not only improves reconstruction quality over standard transformers but also enables scalable training with synthetic data. As a result, our method outperforms existing models on both in-dataset and cross-dataset evaluations, achieving state-of-the-art results across multiple benchmarks while significantly reducing computational costs. Project page: https://scaling3dnvs.github.io/
AUDIT QUALITY AND FINANCIAL HEALTH: A STUDY OF LEADING INDIAN BANKS
International Journal of Accounting Management Economics and Social Sciences (IJAMESC) · 2025-08-31
articleOpen access1st authorCorrespondingThis study investigates the relationship between audit quality and financial performance among leading Indian commercial banks from 2015 to 2024. Audit quality is assessed using a composite Audit Committee Score (ACS), developed through a binary scoring method based on 14 parameters that consider regulatory compliance and governance best practices. The research utilizes a panel of nine Nifty Bank Index banks, selected based on consistent data availability throughout the study period. The study is structured around two primary objectives. Firstly, to evaluate the effectiveness of audit committees, mean ACS values were calculated and used to rank the banks. The findings reveal that private sector banks, particularly Kotak Mahindra Bank and Federal Bank, consistently demonstrated higher audit effectiveness compared to their public sector counterparts. Secondly, the research examines the impact of audit quality on financial performance, with Return on Assets (ROA), Return on Equity (ROE), and Net Interest Margin (NIM) as dependent variables. Panel regression analysis, supported by relevant diagnostic tests and model selection criteria, indicates a statistically significant positive effect of ACS on ROA and ROE, while the effect on NIM was positive but not significant. The research underscores the crucial role of efficient audit committees in enhancing profitability and financial control in banks. The study recommends regulatory and institutional measures to further strengthen the structures of audit committees, particularly in public sector banks, in alignment with best governance practices and financial stability.
FreeViS: Training-free Video Stylization with Inconsistent References
ArXiv.org · 2025-10-02
preprintOpen accessSenior authorVideo stylization plays a key role in content creation, but it remains a challenging problem. Naïvely applying image stylization frame-by-frame hurts temporal consistency and reduces style richness. Alternatively, training a dedicated video stylization model typically requires paired video data and is computationally expensive. In this paper, we propose FreeViS, a training-free video stylization framework that generates stylized videos with rich style details and strong temporal coherence. Our method integrates multiple stylized references to a pretrained image-to-video (I2V) model, effectively mitigating the propagation errors observed in prior works, without introducing flickers and stutters. In addition, it leverages high-frequency compensation to constrain the content layout and motion, together with flow-based motion cues to preserve style textures in low-saliency regions. Through extensive evaluations, FreeViS delivers higher stylization fidelity and superior temporal consistency, outperforming recent baselines and achieving strong human preference. Our training-free pipeline offers a practical and economic solution for high-quality, temporally coherent video stylization. The code and videos can be accessed via https://xujiacong.github.io/FreeViS/
Recent grants
NSF · $300k · 2018–2019
NSF · $60k · 2018–2020
RI: Small: Collaborative Research: Active and Rapid Domain Generalization
NSF · $225k · 2019–2022
NSF · $300k · 2018–2022
NSF · $249k · 2016–2019
Frequent coauthors
- 1158 shared
Fausto Milletarì
- 1156 shared
L. W. Davis
- 1156 shared
R. Nobili
Marche Polytechnic University
- 1156 shared
Francisco Javier Moreno-Morillo
Hospital Universitario Infanta Sofía
- 1156 shared
Claudio Lizio
Nvidia (United States)
- 1156 shared
Maria Distante
Universität Innsbruck
- 1156 shared
Antonio Greco
- 1156 shared
Fabio Galasso
Sapienza University of Rome
Awards & honors
- 2021 IEEE Signal Processing Society (SPS) Pierre-Simon Lapla…
- 2021 NSF CAREER Award
- 2021 IAPR Young Biometrics Investigator Award (YBIA)
- Best Paper Award at IEEE AVSS in 2019
- Best Paper Award at IEEE AVSS in 2017
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Vishal Patel
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup