Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

VG Vinod Vydiswaran

Verified

University of Michigan · Information

Active 2003–2025

h-index23
Citations1.5k
Papers11449 last 5y
Funding
See your match with VG Vinod Vydiswaran — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • World Wide Web
  • Medicine
  • Gerontology

Selected publications

  • Correcting Performance Metrics Bias During Generalization from Biased Samples to Populations

    Studies in health technology and informatics · 2025-08-07

    articleOpen accessSenior author

    The performance of prediction algorithms is typically measured using four metrics: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). These metrics are usually calculated on samples drawn from patient populations. However, the performance metrics computed over a deliberately biased sample would not directly extend to its source population. Further, it is often necessary to infer the metric values for a population different from where the sample was drawn. In this paper, we illustrate methods to solve both challenges. Specifically, given the underlying patient distribution, we show corrections to the formula for these metrics based on two common inverse probability weighting methods: standard cell weighting and logistic regression weighting. We conduct simulation experiments to identify patients living with dementia and compare these methods in performance corrections with different sample sizes for different prevalence settings. We empirically show that weighting methods can correct the estimated values for algorithms' performance. Standard cell weighting is preferred over logistic regression weighting when the sample size is small and only the strata information is available in the populations of interest.

  • Unhealthy alcohol use detection in electronic health records: A comparative study using natural language processing

    Drug and Alcohol Dependence · 2025-10-10

    articleOpen access
  • Comparing methods for determining home and work locations from geotagged social media data

    2025-05-30

    preprintOpen access

    Geotagged social media data have emerged as a rich source of insight about spatial dimensions of social phenomena. This methodological article exploits a unique dataset that combines geotagged social media content and home and work locations collected from social media users through a survey to compare three methods of assigning home locations from geotagged social media: majority voting, time frame clustering, and a novel method using activity spaces created from users’ geotagged posts. Using exact match accuracy as the measure, the basic majority voting method achieved better high estimates for both current and previous home location predictions compared to the time frame clustering method. However, for work location prediction, time frame clustering showed better accuracy, and the activity space method contained 25.3% of true home and 44.4% of true work locations. The study found lower precision than others and highlights accuracy trade-offs among each option for assigning home or work locations from geotagged social media.

  • Comparing methods for determining home and work locations from geotagged social media data

    2025-05-22

    preprintOpen access

    Geotagged social media data have emerged as a rich source of insight about spatial dimensions of social phenomena. This methodological article exploits a unique dataset that combines geotagged social media content and home and work locations collected from social media users through a survey to compare three methods of assigning home locations from geotagged social media: majority voting, time frame clustering, and a novel method using activity spaces created from users’ geotagged posts. Using exact match accuracy as the measure, the basic majority voting method achieved better high estimates for both current and previous home location predictions compared to the time frame clustering method. However, for work location prediction, time frame clustering showed better accuracy, and the activity space method contained 25.3% of true home and 44.4% of true work locations. The study found lower precision than others and highlights accuracy trade-offs among each option for assigning home or work locations from geotagged social media.

  • Generalizing machine learning models from clinical free text

    Scientific Reports · 2025-08-28 · 2 citations

    articleOpen access

    Abstract To assess strategies for enhancing the generalizability of healthcare artificial intelligence models, we analyzed the impact of preprocessing approaches applied to medical free text, compared single- versus multiple-institution data models, and evaluated data divergence metrics. From 1,607,393 procedures across 44 U.S. institutions, deep neural network models were created to classify anesthesiology Current Procedural Terminology codes from medical free text. Three levels of text preprocessing were analyzed from minimal to automated (cSpell) with comprehensive physician review. Kullback–Leibler Divergence and k-medoid clustering were used to predict single- vs multiple-institutional model performances. Single-institution models showed a mean accuracy of 92.5% [2.8% SD] and 0.923 [0.029] F1 on internal data but generalized poorly on external data (− 22.4% [7.0%]; − 0.223 [0.081]). Free text preprocessing minimally altered performance (+ 0.51% [2.23]; + 0.004 [0.020]). An all-institution model performed worse on internal data (-4.88% [2.43%]; − 0.045 [0.020]), but improved generalizability to external data (+ 17.1% [8.7%]; + 0.182 [0.073]). Compared to vocabulary overlap and Jaccard similarity, Kullback–Leibler Divergence correlated with model performance (R 2 of 0.41 vs 0.16 vs 0.08, respectively) and was successful clustering institutions and identifying outlier data. Overall, pre-processing medical free text showed limited utility improving generalization of machine learning models, single institution models performed best but generalized poorly, while combined data models improved generalization but never achieved performance of single-institutional models. Kullback–Leibler Divergence provided valuable insight as a reliable heuristic to evaluate generalizability. These results have important implications in developing broad use artificial intelligence healthcare applications, providing valuable insight into their development and evaluations.

  • Divide-or-Conquer? Which Part Should You Distill Your LLM?

    arXiv (Cornell University) · 2024-02-22 · 2 citations

    preprintOpen access

    Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

  • Automated‐detection of risky alcohol use prior to surgery using natural language processing

    Alcohol Clinical and Experimental Research · 2024-01-01 · 11 citations

    articleOpen access1st author

    BACKGROUND: Preoperative risky alcohol use is one of the most common surgical risk factors. Accurate and early identification of risky alcohol use could enhance surgical safety. Artificial Intelligence-based approaches, such as natural language processing (NLP), provide an innovative method to identify alcohol-related risks from patients' electronic health records (EHR) before surgery. METHODS: Clinical notes (n = 53,629) from pre-operative patients in a tertiary care facility were analyzed for evidence of risky alcohol use and alcohol use disorder. One hundred of these records were reviewed by experts and labeled for comparison. A rule-based NLP model was built, and we assessed the clinical notes for the entire population. Additionally, we assessed each record for the presence or absence of alcohol-related International Classification of Diseases (ICD) diagnosis codes as an additional comparator. RESULTS: NLP correctly identified 87% of the human-labeled patients classified with risky alcohol use. In contrast, diagnosis codes alone correctly identified only 29% of these patients. In terms of specificity, NLP correctly identified 84% of the non-risky cohort, while diagnosis codes correctly identified 90% of this cohort. In the analysis of the full dataset, the NLP-based approach identified three times more patients with risky alcohol use than ICD codes. CONCLUSIONS: NLP, an artificial intelligence-based approach, efficiently and accurately identifies alcohol-related risk in patients' EHRs. This approach could supplement other alcohol screening tools to identify patients in need of intervention, treatment, and/or postoperative withdrawal prophylaxis. Alcohol-related ICD diagnosis had limited utility relative to NLP, which extracts richer information within clinical notes to classify patients.

  • Virtual Care: Perspectives From Family Physicians

    Family Medicine · 2024-04-15 · 3 citations

    articleOpen access

    BACKGROUND: During the COVID-19 pandemic, virtual care expanded rapidly at Michigan Medicine and other health systems. From family physicians' perspectives, this shift to virtual care has the potential to affect workflow, job satisfaction, and patient communication. As clinics reopened and care delivery models shifted to a combination of in-person and virtual care, the need to understand physician experiences with virtual care arose in order to improve both patient and provider experiences. This study investigated Michigan Medicine family medicine physicians' perceptions of virtual care through qualitative interviews to better understand how to improve the quality and effectiveness of virtual care for both patients and physicians. METHODS: We employed a qualitative descriptive design to examine physician perspectives through semistructured interviews. We coded and analyzed transcripts using thematic analysis, facilitated by MAXQDA (VERBI) software. RESULTS: The results of the analysis identified four major themes: (a) chief concerns that are appropriate for virtual evaluation, (b) physician perceptions of patient benefits, (c) focused but contextually enriched patient-physician communication, and (d) structural support needed for high-quality virtual care. CONCLUSIONS: These findings can help further direct the discussion of how to make use of resources to improve the quality and effectiveness of virtual care.

  • Divide-or-Conquer? Which Part Should You Distill Your LLM?

    2024-01-01 · 7 citations

    articleOpen access

    Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first.In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution.Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies.We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost.We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models.However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization.These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

  • Social Acceptability of Health Behavior Posts on Social Media: An Experiment

    American Journal of Preventive Medicine · 2024-01-06 · 2 citations

    articleOpen access

Frequent coauthors

Labs

  • VYDISWARAN LABPI

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with VG Vinod Vydiswaran

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup