VG Vinod Vydiswaran

Verified

University of Michigan · Information

Active 2003–2025

h-index23

Citations1.5k

Papers11449 last 5y

Funding—

Faculty page

See your match with VG Vinod Vydiswaran — sign in to PhdFit.Sign in

Research topics

Computer Science
World Wide Web
Medicine
Gerontology

Selected publications

Correcting Performance Metrics Bias During Generalization from Biased Samples to Populations
Studies in health technology and informatics · 2025-08-07
articleOpen accessSenior author
The performance of prediction algorithms is typically measured using four metrics: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). These metrics are usually calculated on samples drawn from patient populations. However, the performance metrics computed over a deliberately biased sample would not directly extend to its source population. Further, it is often necessary to infer the metric values for a population different from where the sample was drawn. In this paper, we illustrate methods to solve both challenges. Specifically, given the underlying patient distribution, we show corrections to the formula for these metrics based on two common inverse probability weighting methods: standard cell weighting and logistic regression weighting. We conduct simulation experiments to identify patients living with dementia and compare these methods in performance corrections with different sample sizes for different prevalence settings. We empirically show that weighting methods can correct the estimated values for algorithms' performance. Standard cell weighting is preferred over logistic regression weighting when the sample size is small and only the strata information is available in the populations of interest.
Publisher DOI
Unhealthy alcohol use detection in electronic health records: A comparative study using natural language processing
Drug and Alcohol Dependence · 2025-10-10
articleOpen access
Publisher OA PDF DOI
Comparing methods for determining home and work locations from geotagged social media data
2025-05-30
preprintOpen access
Geotagged social media data have emerged as a rich source of insight about spatial dimensions of social phenomena. This methodological article exploits a unique dataset that combines geotagged social media content and home and work locations collected from social media users through a survey to compare three methods of assigning home locations from geotagged social media: majority voting, time frame clustering, and a novel method using activity spaces created from users’ geotagged posts. Using exact match accuracy as the measure, the basic majority voting method achieved better high estimates for both current and previous home location predictions compared to the time frame clustering method. However, for work location prediction, time frame clustering showed better accuracy, and the activity space method contained 25.3% of true home and 44.4% of true work locations. The study found lower precision than others and highlights accuracy trade-offs among each option for assigning home or work locations from geotagged social media.
Publisher OA PDF DOI
Comparing methods for determining home and work locations from geotagged social media data
2025-05-22
preprintOpen access
Geotagged social media data have emerged as a rich source of insight about spatial dimensions of social phenomena. This methodological article exploits a unique dataset that combines geotagged social media content and home and work locations collected from social media users through a survey to compare three methods of assigning home locations from geotagged social media: majority voting, time frame clustering, and a novel method using activity spaces created from users’ geotagged posts. Using exact match accuracy as the measure, the basic majority voting method achieved better high estimates for both current and previous home location predictions compared to the time frame clustering method. However, for work location prediction, time frame clustering showed better accuracy, and the activity space method contained 25.3% of true home and 44.4% of true work locations. The study found lower precision than others and highlights accuracy trade-offs among each option for assigning home or work locations from geotagged social media.
Publisher OA PDF DOI
Generalizing machine learning models from clinical free text
Scientific Reports · 2025-08-28 · 2 citations
articleOpen access
Abstract To assess strategies for enhancing the generalizability of healthcare artificial intelligence models, we analyzed the impact of preprocessing approaches applied to medical free text, compared single- versus multiple-institution data models, and evaluated data divergence metrics. From 1,607,393 procedures across 44 U.S. institutions, deep neural network models were created to classify anesthesiology Current Procedural Terminology codes from medical free text. Three levels of text preprocessing were analyzed from minimal to automated (cSpell) with comprehensive physician review. Kullback–Leibler Divergence and k-medoid clustering were used to predict single- vs multiple-institutional model performances. Single-institution models showed a mean accuracy of 92.5% [2.8% SD] and 0.923 [0.029] F1 on internal data but generalized poorly on external data (− 22.4% [7.0%]; − 0.223 [0.081]). Free text preprocessing minimally altered performance (+ 0.51% [2.23]; + 0.004 [0.020]). An all-institution model performed worse on internal data (-4.88% [2.43%]; − 0.045 [0.020]), but improved generalizability to external data (+ 17.1% [8.7%]; + 0.182 [0.073]). Compared to vocabulary overlap and Jaccard similarity, Kullback–Leibler Divergence correlated with model performance (R 2 of 0.41 vs 0.16 vs 0.08, respectively) and was successful clustering institutions and identifying outlier data. Overall, pre-processing medical free text showed limited utility improving generalization of machine learning models, single institution models performed best but generalized poorly, while combined data models improved generalization but never achieved performance of single-institutional models. Kullback–Leibler Divergence provided valuable insight as a reliable heuristic to evaluate generalizability. These results have important implications in developing broad use artificial intelligence healthcare applications, providing valuable insight into their development and evaluations.
Publisher OA PDF DOI
Divide-or-Conquer? Which Part Should You Distill Your LLM?
arXiv (Cornell University) · 2024-02-22 · 2 citations
preprintOpen access
Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.
Publisher OA PDF DOI
Automated‐detection of risky alcohol use prior to surgery using natural language processing
Alcohol Clinical and Experimental Research · 2024-01-01 · 11 citations
articleOpen access1st author
BACKGROUND: Preoperative risky alcohol use is one of the most common surgical risk factors. Accurate and early identification of risky alcohol use could enhance surgical safety. Artificial Intelligence-based approaches, such as natural language processing (NLP), provide an innovative method to identify alcohol-related risks from patients' electronic health records (EHR) before surgery. METHODS: Clinical notes (n = 53,629) from pre-operative patients in a tertiary care facility were analyzed for evidence of risky alcohol use and alcohol use disorder. One hundred of these records were reviewed by experts and labeled for comparison. A rule-based NLP model was built, and we assessed the clinical notes for the entire population. Additionally, we assessed each record for the presence or absence of alcohol-related International Classification of Diseases (ICD) diagnosis codes as an additional comparator. RESULTS: NLP correctly identified 87% of the human-labeled patients classified with risky alcohol use. In contrast, diagnosis codes alone correctly identified only 29% of these patients. In terms of specificity, NLP correctly identified 84% of the non-risky cohort, while diagnosis codes correctly identified 90% of this cohort. In the analysis of the full dataset, the NLP-based approach identified three times more patients with risky alcohol use than ICD codes. CONCLUSIONS: NLP, an artificial intelligence-based approach, efficiently and accurately identifies alcohol-related risk in patients' EHRs. This approach could supplement other alcohol screening tools to identify patients in need of intervention, treatment, and/or postoperative withdrawal prophylaxis. Alcohol-related ICD diagnosis had limited utility relative to NLP, which extracts richer information within clinical notes to classify patients.
Publisher OA PDF DOI
Virtual Care: Perspectives From Family Physicians
Family Medicine · 2024-04-15 · 3 citations
articleOpen access
BACKGROUND: During the COVID-19 pandemic, virtual care expanded rapidly at Michigan Medicine and other health systems. From family physicians' perspectives, this shift to virtual care has the potential to affect workflow, job satisfaction, and patient communication. As clinics reopened and care delivery models shifted to a combination of in-person and virtual care, the need to understand physician experiences with virtual care arose in order to improve both patient and provider experiences. This study investigated Michigan Medicine family medicine physicians' perceptions of virtual care through qualitative interviews to better understand how to improve the quality and effectiveness of virtual care for both patients and physicians. METHODS: We employed a qualitative descriptive design to examine physician perspectives through semistructured interviews. We coded and analyzed transcripts using thematic analysis, facilitated by MAXQDA (VERBI) software. RESULTS: The results of the analysis identified four major themes: (a) chief concerns that are appropriate for virtual evaluation, (b) physician perceptions of patient benefits, (c) focused but contextually enriched patient-physician communication, and (d) structural support needed for high-quality virtual care. CONCLUSIONS: These findings can help further direct the discussion of how to make use of resources to improve the quality and effectiveness of virtual care.
Publisher OA PDF DOI
Divide-or-Conquer? Which Part Should You Distill Your LLM?
2024-01-01 · 7 citations
articleOpen access
Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first.In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution.Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies.We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost.We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models.However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization.These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.
Publisher OA PDF DOI
Social Acceptability of Health Behavior Posts on Social Media: An Experiment
American Journal of Preventive Medicine · 2024-01-06 · 2 citations
articleOpen access
Publisher OA PDF DOI

Frequent coauthors

Tiffany C. Veinot
University of Michigan–Ann Arbor
22 shared
Qiaozhu Mei
15 shared
Robert Goodspeed
University of Michigan–Ann Arbor
15 shared
Deahan Yu
14 shared
Dan Roth
14 shared
Kai Zheng
China University of Geosciences (Beijing)
13 shared
David A. Hanauer
University of Michigan–Ann Arbor
12 shared
Danny T Y Wu
Cincinnati Children's Hospital Medical Center
11 shared

Labs

VYDISWARAN LABPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with VG Vinod Vydiswaran

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you