Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Vasant Honavar

Vasant Honavar

· Professor and Edward Frymoyer Chair of Information Sciences and Technology, Director, Center for Big Data Analytics and Discovery Informatics, Director, Artificial Intelligence Research Laboratory, AssociateVerified

Pennsylvania State University · Social Data Analytics

Active 1989–2026

h-index53
Citations14.2k
Papers48753 last 5y
Funding$3.0M
See your match with Vasant Honavar — sign in to PhdFit.Sign in

About

Vasant Honavar is a Professor and Edward Frymoyer Chair of Information Sciences and Technology at Pennsylvania State University. He serves as the Director of the Center for Big Data Analytics and Discovery Informatics and the Director of the Artificial Intelligence Research Laboratory. Additionally, he is the Associate Director of the Institute for CyberScience and a Graduate Faculty member in Social Data Analytics, as well as a C-SoDA Faculty Affiliate. His research focuses on social data analytics, artificial intelligence, and big data discovery. Honavar's work involves advancing the understanding and application of data-driven methods in social sciences and informatics, contributing to the development of innovative approaches in these fields. His professional profile is accessible through the university's faculty webpage and other online platforms, reflecting his active engagement in research and academic leadership.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Machine Learning
  • Political Science
  • Computer Security
  • Engineering
  • Data Mining
  • Sociology
  • Demographic economics
  • Chemistry
  • Electrical engineering
  • Economic geography
  • Demography
  • Engineering physics
  • Social psychology
  • Physics
  • Optoelectronics
  • Nanotechnology
  • Geography
  • Condensed matter physics
  • Economics
  • Medicine
  • Psychology
  • Materials science

Selected publications

  • Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

    ArXiv.org · 2026-02-27

    articleOpen accessSenior author

    Large unimodal foundation models for vision and language encode rich semantic structures, yet aligning them typically requires computationally intensive multimodal fine-tuning. Such approaches depend on large-scale parameter updates, are resource intensive, and can perturb pretrained representations. Emerging evidence suggests, however, that independently trained foundation models may already exhibit latent semantic compatibility, reflecting shared structures in the data they model. This raises a fundamental question: can cross-modal alignment be achieved without modifying the models themselves? Here we introduce HDFLIM (HyperDimensional computing with Frozen Language and Image Models), a framework that establishes cross-modal mappings while keeping pretrained vision and language models fully frozen. HDFLIM projects unimodal embeddings into a shared hyperdimensional space and leverages lightweight symbolic operations -- binding, bundling, and similarity-based retrieval to construct associative cross-modal representations in a single pass over the data. Caption generation emerges from high-dimensional memory retrieval rather than iterative gradient-based optimization. We show that HDFLIM achieves performance comparable to end-to-end vision-language training methods and produces captions that are more semantically grounded than zero-shot baselines. By decoupling alignment from parameter tuning, our results suggest that semantic mapping across foundation models can be realized through symbolic operations on hyperdimensional encodings of the respective embeddings. More broadly, this work points toward an alternative paradigm for foundation model alignment in which frozen models are integrated through structured representational mappings rather than through large-scale retraining. The codebase for our implementation can be found at https://github.com/Abhishek-Dalvi410/HDFLIM.

  • Eliminating Inconsistencies among CP-Theory Qualitative Preferences

    2026-05-24

    article

    Inconsistency in preference reasoning arises when a set of preferences implies that an outcome is preferred over itself. In multi-agent settings, conflicting preferences of the agents lead to inconsistencies in their collective preferences. We examine the problem of establishing consistency by selectively discarding a subset of input preferences, where references are expressed qualitatively in CP-theory language. Specifically, we explore two variants (1) identifying a minimal set of preferences to discard in order to eliminate inconsistencies, and (2) finding a set of preferences whose removal minimally alters the induced dominance so as to eliminate the inconsistencies. We show that both minimization problems are NP-complete. We propose an iterative Integer Linear Programming (ILP)-based approach to their solution. Finally, we present experimental results that demonstrate the feasibility of our solution. We observe that optimizing one objective in isolation often compromises the other. We explore sequential strategies that prioritize one objective followed by the optimization of the other, and propose an empirically balanced approach that achieves improved overall outcomes.

  • Hyperdimensional Cross-Modal Alignment of Frozen Language and Image Models for Efficient Image Captioning

    Open MIND · 2026-02-27

    preprintSenior author

    Large unimodal foundation models for vision and language encode rich semantic structures, yet aligning them typically requires computationally intensive multimodal fine-tuning. Such approaches depend on large-scale parameter updates, are resource intensive, and can perturb pretrained representations. Emerging evidence suggests, however, that independently trained foundation models may already exhibit latent semantic compatibility, reflecting shared structures in the data they model. This raises a fundamental question: can cross-modal alignment be achieved without modifying the models themselves? Here we introduce HDFLIM (HyperDimensional computing with Frozen Language and Image Models), a framework that establishes cross-modal mappings while keeping pretrained vision and language models fully frozen. HDFLIM projects unimodal embeddings into a shared hyperdimensional space and leverages lightweight symbolic operations -- binding, bundling, and similarity-based retrieval to construct associative cross-modal representations in a single pass over the data. Caption generation emerges from high-dimensional memory retrieval rather than iterative gradient-based optimization. We show that HDFLIM achieves performance comparable to end-to-end vision-language training methods and produces captions that are more semantically grounded than zero-shot baselines. By decoupling alignment from parameter tuning, our results suggest that semantic mapping across foundation models can be realized through symbolic operations on hyperdimensional encodings of the respective embeddings. More broadly, this work points toward an alternative paradigm for foundation model alignment in which frozen models are integrated through structured representational mappings rather than through large-scale retraining. The codebase for our implementation can be found at https://github.com/Abhishek-Dalvi410/HDFLIM.

  • SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

    ArXiv.org · 2025-02-02

    preprintOpen accessSenior author

    Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameter-free preference optimization algorithm for alignment. We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER, is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches-even without any hyperparameters or a reference model . For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. The source code for SimPER is publicly available at: https://github.com/tengxiao1/SimPER.

  • CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning

    ArXiv.org · 2025-10-31

    preprintOpen accessSenior author

    State-of-the-art (SOTA) LLMs have progressed from struggling on proof-based Olympiad problems to solving most of the IMO 2025 problems, with leading systems reportedly handling 5 of 6 problems. Given this progress, we assess how well these models can grade proofs: detecting errors, judging their severity, and assigning fair scores beyond binary correctness. We study proof-analysis capabilities using a corpus of 90 Gemini 2.5 Pro-generated solutions that we grade on a 1-4 scale with detailed error annotations, and on MathArena solution sets for IMO/USAMO 2025 scored on a 0-7 scale. Our analysis shows that models can reliably flag incorrect (including subtly incorrect) solutions but exhibit calibration gaps in how partial credit is assigned. To address this, we introduce agentic workflows that extract and analyze reference solutions and automatically derive problem-specific rubrics for a multi-step grading process. We instantiate and compare different design choices for the grading workflows, and evaluate their trade-offs. Across our annotated corpus and MathArena, our proposed workflows achieve higher agreement with human grades and more consistent handling of partial credit across metrics. We release all code, data, and prompts/logs to facilitate future research.

  • Reproducibility of the correlation between race and outcomes associated with treatment of HER2+ breast cancer across databases.

    Journal of Clinical Oncology · 2025-05-28

    article

    e13769 Background: HER2-positive (HER2+) breast cancer (BrCa) is an aggressive subtype, accounting for 20–30% of all BrCa cases. Trastuzumab, a monoclonal anti-HER2 antibody, remains the cornerstone of treatment. Previously, we identified race as a factor associated with increased toxicities and poorer outcomes in HER2+ BrCa treatment using the TriNetX database. Methods: In this propensity score-matched cohort study, we used the TriNetX Research Network to compare mortality rates in non-Hispanic African American (NHAA) and non-Hispanic White (NHW) women with HER2+ BrCa. Cohorts were matched for age, BMI, comorbidities, and laboratory values using 1:1 matching with a greedy nearest neighbor search. To assess the robustness of the results based on the TriNetX data, we compared overall mortality after diagnosis between matched NHAA and NHW cohorts using patient-level data from the SEER and NCDB databases. Results: SEER data indicated that NHAA women had 1.48 times higher odds of death compared to NHW women. NCDB data showed NHAA women had 1.36 times higher odds of death, while TriNetX data revealed 1.10 times higher odds. We examined the effect of marital status and education on mortality, finding that being married and living in areas with higher education levels were associated with lower odds of mortality. Mortality odds decreased as education level increased, with the highest level of education showing the greatest protective effect. The influence of these proxies for socioeconomic status was independent of race. Conclusions: Our findings demonstrate increased odds of mortality for NHAA compared to NHW patients with HER2+ BrCa undergoing Trastuzumab therapy across all three databases. We identified social factors, including education and marital status, as independent predictors of mortality in HER2+ BrCa patients. The consistency of mortality findings across databases supports the reliability of our previous TriNetX results, confirming that NHAA women with HER2+ BrCa face higher odds of mortality and toxic effects compared to their NHW counterparts. Effect of marital status on mortality while adjusting for gender, age, stage of disease, and race using SEER. Marital Status Odds Ratio (95% CI) Interpretation Divorced (Reference) - Baseline group for comparison Married (including common law) 0.75 [0.68, 0.83] 25% lower odds of death compared to divorced individuals Separated 1.19 [0.87, 1.62] Not significantly different from divorced individuals Single (never married) 1.18 [1.04, 1.34] 18% higher odds of death compared to divorced individuals Unknown 1.27 [1.06, 1.53] 27% higher odds of death compared to divorced individuals Unmarried/Domestic Partner 0.73 [0.37, 1.44] Not significantly different from divorced individuals Widowed 1.52 [1.34, 1.73] 52% higher odds of death compared to divorced individuals

  • Hyperdimensional Representation Learning for Node Classification and Link Prediction

    2025-02-26 · 4 citations

    articleOpen accessSenior author

    We introduce Hyperdimensional Graph Learner (HDGL), a novel method for node classification and link prediction in graphs. HDGL maps node features into a very high-dimensional space (hyperdimensional or HD space for short) using the injectivity property of node representations in a family of Graph Neural Networks (GNNs) and then uses HD operators such as bundling and binding to aggregate information from the local neighborhood of each node yielding latent node representations that can support both node classification and link prediction tasks. HDGL, unlike GNNs that rely on computationally expensive iterative optimization and hyperparameter tuning, requires only a single pass through the data set. We report results of experiments using widely used benchmark datasets which demonstrate that, on the node classification task, HDGL achieves accuracy that is competitive with that of the state-of-the-art GNN methods at substantially reduced computational cost; and on the link prediction task, HDGL matches the performance of DeepWalk and related methods, although it falls short of computationally demanding state-of-the-art GNNs.

  • DiaLLMs: EHR-Enhanced Clinical Conversational System for Clinical Test Recommendation and Diagnosis Prediction

    2025-01-01

    articleOpen accessSenior author

    Recent advances in Large Language Models (LLMs) have led to remarkable progresses in medical consultation.However, existing medical LLMs overlook the essential role of Electronic Health Records (EHR) and focus primarily on diagnosis recommendation, limiting their clinical applicability.We propose DiaLLM, the first medical LLM that integrates heterogeneous EHR data into clinically grounded dialogues, enabling clinical test recommendation, result interpretation, and diagnosis prediction to better align with realworld medical practice.To construct clinically grounded dialogues from EHR, we design a Clinical Test Reference (CTR) strategy that maps each clinical code to its corresponding description and classifies test results as "normal" or "abnormal".Additionally, DiaLLM employs a reinforcement learning framework for evidence acquisition and automated diagnosis.To handle the large action space, we introduce a reject sampling strategy to reduce redundancy and improve exploration efficiency.Furthermore, a confirmation reward and a class-sensitive diagnosis reward are designed to guide accurate diagnosis prediction.Extensive experimental results demonstrate that DiaLLM outperforms baselines in clinical test recommendation and diagnosis prediction.Our code is available at Github 1 .

  • Abstract P2-10-15: Reproducibility of the interactions of race on outcomes and toxicities associated with treatment of HER2+ Breast Cancer across databases

    Clinical Cancer Research · 2025-06-13

    article

    Abstract Objective: Investigate the reproducibility of the interaction of race on outcomes and toxicities associated with HER2+BrCa treatment across multiple data sources. Background: HER2+ breast cancer (BrCa) is an aggressive subtype, accounting for 20-30% of all BrCa cases. Trastuzumab, a monoclonal anti-HER2 antibody, remains the cornerstone of treatment. Previously, we identified race as a factor associated with increased toxicities and poorer outcomes in HER2+BrCa treatment using the TrinetX Database. Further investigation into the outcomes and toxicities in minority groups is necessary, emphasizing the importance of utilizing databases for such research. Design/Methods: In this propensity score-matched cohort study we used the TriNetX Research Network to compare mortality of HER2+BrCa in non-Hispanic African American (NHAA) women to a corresponding non-Hispanic White cohort (NHW). Qualification into the two race based HER2+BrCa cohorts required the presence of a C50 ICD-10-CM diagnosis code and at least one proxy for HER2 positivity such as Trastuzumab (index event). Cohorts were matched for age, BMI, comorbidities, and lab values using 1:1 matching with a greedy nearest neighbor search. Toxicity outcomes were also compared between the cohorts. The associations of observed outcome frequencies in the two cohorts were tested for significance using the chi-square test. The odds ratios with 95% confidence intervals, are reported as an effect size and significance estimation. As a test of robustness of the results based on the TriNetX data, we compared the outcome of all time mortality after diagnosis across unmatched NHAA and NHW cohorts using the well-established SEER and NCDB patient-level databases. Results: Using SEER, 68,697 patients met the inclusion criteria for HER2+BrCa (57,554 NHW, 11,143 NHAA). The median age at the index event was 60-64 years for NHW and 55-59 years for NHAA. In the unmatched HER2+BrCa cohorts, the odds of death for NHAA were 1.32 times higher than for NHW, with a P-value of < 0.00001. Using NCDB, 352,553 patients met the inclusion criteria (305,219 NHW, 47,334 NHAA). In these cohorts, the odds of death for NHAA were 1.61 times higher than for NHW, also with a P-value of < 0.00001. Using TriNetX, the odds of death at 5 years after diagnosis for NHAA were 1.1379 times higher than for NHW in the unmatched HER2+BrCa cohorts, with a 95% confidence interval. TriNetX data also showed that NHAA women had significantly increased odds of neuropathy and cardiomyopathy at 1, 3, and 5-year intervals after the HER2+BrCa index date, compared to NHW women. Additionally, the odds of an emergency room visit (for any reason) were up to 79% higher in NHAA women compared to their NHW counterparts. There was a lack of detailed toxicity data points, such as neuropathy and cardiomyopathy, in the NCDB and SEER databases, preventing cross-comparison. Conclusion: We found increased odds of mortality for NHAA patients compared to NHW patients with HER2+BrCa undergoing Trastuzumab therapy across all three databases. Although toxicity data could not be directly compared across sources due to lack of granularity, the consistency of mortality findings highlights the reliability of our previous results from the TriNetX Database. In TriNetX, NHAA women with HER2+BrCa demonstrated higher odds of experiencing neuropathy, cardiomyopathy, and emergency room visits compared to their NHW counterparts. These results emphasize the importance of considering racial impact on HER2+BrCa outcomes and highlight the utility of real-world data sources like TriNetX for exploring BrCa outcomes and treatment toxicities. Our findings suggest the need for further research and targeted interventions to address disparities and improve treatment outcomes in diverse populations. Citation Format: Britney Fitzgerald, Justin Petucci, Vasant Honavar, Monali Vasekar. Reproducibility of the interactions of race on outcomes and toxicities associated with treatment of HER2+ Breast Cancer across databases [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2024; 2024 Dec 10-13; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2025;31(12 Suppl):Abstract nr P2-10-15.

  • Reinforcement Learning for Large Language Models via Group Preference Reward Shaping

    2025-01-01

    articleOpen accessSenior author

    Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Zhimeng Guo, Shijie Zhou, Shuyue Hu, Vasant G. Honavar. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

Recent grants

Frequent coauthors

  • Drena Dobbs

    Iowa State University

    59 shared
  • Yasser EL‐Manzalawy

    Geisinger Health System

    46 shared
  • Doina Caragea

    44 shared
  • Samik Basu

    Indian Statistical Institute

    38 shared
  • Adrian Silvescu

    34 shared
  • Ganesh Ram Santhanam

    Iowa State University

    29 shared
  • Jie Bao

    Tsinghua University

    28 shared
  • Karthik Balakrishnan

    Stanford Health Care

    27 shared

Education

  • PhD, Computer Science

    University of Wisconsin Madison

    1990
  • M.S., Computer Science

    University of Wisconsin Madison

    1989
  • M.S., Electrical and Computer Engineering

    Drexel University

    1984
  • B.E., Electronics Engineering

    Bangalore University

    1982
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Vasant Honavar

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup