Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Olivier Gevaert

Olivier Gevaert

Verified

Stanford University · Rheumatology

Active 2005–2026

h-index60
Citations15.8k
Papers490264 last 5y
Funding$6.5M
See your match with Olivier Gevaert — sign in to PhdFit.Sign in

About

Olivier Gevaert is an Assistant Professor of Medicine in Biomedical Informatics and of Biomedical Data Science at Stanford University. He is affiliated with the Center for Artificial Intelligence in Medicine & Imaging (AIMI). His research focuses on the application of artificial intelligence and data science to medicine and imaging, contributing to advancements in biomedical informatics. As part of his role, he is involved in the development and integration of AI technologies to improve healthcare outcomes and medical imaging analysis.

Research topics

  • Computer Science
  • Artificial Intelligence
  • Data science
  • Medicine
  • Machine Learning
  • Pathology
  • Data Mining
  • Geology
  • Internal medicine
  • Radiology
  • Biology
  • Nuclear medicine
  • Computational biology
  • Genetics
  • Medical physics
  • Simulation
  • Engineering
  • Structural engineering

Selected publications

  • MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

    ArXiv.org · 2026-05-19

    articleOpen accessSenior author

    Deep learning methods have demonstrated promising results in predicting BI-RADS scores from mammography images. However, the interpretation of these images can vary, leading to discrepancies even among radiologists. Given the inherent complexity of mammograms, training classification models solely on image labels often yields limited performance. To address this challenge, we curated 2313 mammogram images and their corresponding captions from two mammography atlases. Our proposed approach employs a multi-modal model that uses a pretrained PubMedBERT as the language component. By training this model on image-text pairs with contrastive learning, we enable the vision encoder to absorb the rich information contained in the captions, thereby improving its understanding of mammography findings. We then fine-tune the vision encoder on two datasets for BI-RADS prediction, achieving superior performance compared with models trained without this pretraining, particularly when labeled samples are scarce. The improvement in the 3-class average F1 score ranges from +1% to +14%: a +1% increase with 40K training samples, and a +14% increase with 1K samples. Furthermore, our experiments reveal that 2K image-text pairs from mammography atlases can be more informative than 2K labeled samples for label prediction, with an average margin of +1.1% when more than 10K training samples are available. Overall, our work provides a vision-language model for mammography and highlights the value of textual information from mammography atlases. In addition, we publicly release preprocessed mammography images of the TEKNOFEST dataset. The training code, pre-trained model weights, data extraction scripts, and the released dataset are publicly available at: https://github.com/igulluk/MAM-CLIP

  • MAM-CLIP: Vision-Language Pretraining on Mammography Atlases for BI-RADS Classification

    arXiv (Cornell University) · 2026-05-19

    preprintOpen accessSenior author

    Deep learning methods have demonstrated promising results in predicting BI-RADS scores from mammography images. However, the interpretation of these images can vary, leading to discrepancies even among radiologists. Given the inherent complexity of mammograms, training classification models solely on image labels often yields limited performance. To address this challenge, we curated 2313 mammogram images and their corresponding captions from two mammography atlases. Our proposed approach employs a multi-modal model that uses a pretrained PubMedBERT as the language component. By training this model on image-text pairs with contrastive learning, we enable the vision encoder to absorb the rich information contained in the captions, thereby improving its understanding of mammography findings. We then fine-tune the vision encoder on two datasets for BI-RADS prediction, achieving superior performance compared with models trained without this pretraining, particularly when labeled samples are scarce. The improvement in the 3-class average F1 score ranges from +1% to +14%: a +1% increase with 40K training samples, and a +14% increase with 1K samples. Furthermore, our experiments reveal that 2K image-text pairs from mammography atlases can be more informative than 2K labeled samples for label prediction, with an average margin of +1.1% when more than 10K training samples are available. Overall, our work provides a vision-language model for mammography and highlights the value of textual information from mammography atlases. In addition, we publicly release preprocessed mammography images of the TEKNOFEST dataset. The training code, pre-trained model weights, data extraction scripts, and the released dataset are publicly available at: https://github.com/igulluk/MAM-CLIP

  • Improving Medical VQA through Trajectory-Aware Process Supervision

    ArXiv.org · 2026-04-10

    articleOpen accessSenior author

    Reasoning capabilities are crucial for reliable medical visual question answering (VQA); however, existing datasets rarely include reasoning explanations. We address this by generating reasoning trajectories for six medical VQA benchmarks using the COMCTS algorithm with open-source vision-language models, with an LLM serving as the verification judge. Building on these generated datasets, we propose a two-stage training framework: supervised fine-tuning followed by Group Relative Policy Optimization (GRPO) with a novel process-based reward. While standard approaches rely solely on exact-match rewards for final answers, we introduce a trajectory-aware reward that measures the similarity between generated and ground-truth reasoning processes. Specifically, we embed reasoning steps using sentence transformers and compute the Dynamic Time Warping (DTW) distance between the resulting vector sequences. Experiments across six benchmarks demonstrate that combining the DTW-based process reward with exact-match reward consistently outperforms SFT-only training, raising mean accuracy from 0.598 to 0.689, mean BERTScore from 0.845 to 0.881, and mean ROUGE-L from 0.665 to 0.748. Our results highlight the importance of process supervision in training reasoning-capable medical VLMs. We make our code and generated reasoning datasets publicly available at https://anonymous.4open.science/r/MICCAI-R1-MED-VQA-code-B14B/

  • SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning

    ArXiv.org · 2026-04-10

    articleOpen accessSenior author

    Medical vision-language datasets are often limited in size and biased toward negative findings, as clinicians report abnormalities mostly but might omit some positive/neutral findings because they might be considered as irrelevant to the patient's condition. We propose a self-supervised data enrichment method that leverages semantic clustering of report sentences. Then we enrich the findings in the medical reports in the training set by adding positive/neutral observations from different clusters in a self-supervised manner. Our approach yields consistent gains in supervised fine-tuning (5.63%, 3.04%, 7.40%, 5.30%, 7.47% average gains on COMET score, Bert score, Sentence Bleu, CheXbert-F1 and RadGraph-F1 scores respectively). Ablation studies confirm that improvements stem from semantic clustering rather than random augmentation. Furthermore, we introduce a way to incorporate semantic cluster information into the reward design for GRPO training, which leads to further performance gains (2.78%, 3.14%, 12.80% average gains on COMET score, Bert score and Sentence Bleu scores respectively). We share our code at https://anonymous.4open.science/r/SemEnrich-75CF

  • SemEnrich: Self-Supervised Semantic Enrichment of Radiology Reports for Vision-Language Learning

    arXiv (Cornell University) · 2026-04-10

    preprintOpen accessSenior author

    Medical vision-language datasets are often limited in size and biased toward negative findings, as clinicians report abnormalities mostly but might omit some positive/neutral findings because they might be considered as irrelevant to the patient's condition. We propose a self-supervised data enrichment method that leverages semantic clustering of report sentences. Then we enrich the findings in the medical reports in the training set by adding positive/neutral observations from different clusters in a self-supervised manner. Our approach yields consistent gains in supervised fine-tuning (5.63%, 3.04%, 7.40%, 5.30%, 7.47% average gains on COMET score, Bert score, Sentence Bleu, CheXbert-F1 and RadGraph-F1 scores respectively). Ablation studies confirm that improvements stem from semantic clustering rather than random augmentation. Furthermore, we introduce a way to incorporate semantic cluster information into the reward design for GRPO training, which leads to further performance gains (2.78%, 3.14%, 12.80% average gains on COMET score, Bert score and Sentence Bleu scores respectively). We share our code at https://anonymous.4open.science/r/SemEnrich-75CF

  • SAGE-FM: A lightweight and interpretable spatial transcriptomics foundation model

    ArXiv.org · 2026-01-21

    articleOpen accessSenior author

    Spatial transcriptomics enables spatial gene expression profiling, motivating computational models that capture spatially conditioned regulatory relationships. We introduce SAGE-FM, a lightweight spatial transcriptomics foundation model based on graph convolutional networks (GCNs) trained with a masked central spot prediction objective. Trained on 416 human Visium samples spanning 15 organs, SAGE-FM learns spatially coherent embeddings that robustly recover masked genes, with 91% of masked genes showing significant correlations (p < 0.05). The embeddings generated by SAGE-FM outperform MOFA and existing spatial transcriptomics methods in unsupervised clustering and preservation of biological heterogeneity. SAGE-FM generalizes to downstream tasks, enabling 81% accuracy in pathologist-defined spot annotation in oropharyngeal squamous cell carcinoma and improving glioblastoma subtype prediction relative to MOFA. In silico perturbation experiments further demonstrate that the model captures directional ligand-receptor and upstream-downstream regulatory effects consistent with ground truth. These results demonstrate that simple, parameter-efficient GCNs can serve as biologically interpretable and spatially aware foundation models for large-scale spatial transcriptomics.

  • Abstract 2775: Deep-learning CT biomarker improves early efficacy detection in simulated randomized phase II NSCLC trials.

    Cancer Research · 2026-04-03

    article

    Abstract Background: Early decision-making in advanced non-small cell lung cancer (NSCLC) phase II trials is limited by the modest ability of objective response and progression-free survival (PFS) to detect early biological activity or predict overall survival (OS). Quantitative deep-learning analysis of routine CT imaging may offer a more sensitive measure that better reflects long-term benefit. We evaluated whether Serial CTRS, a fully automated CT-based deep-learning imaging biomarker, could improve early efficacy detection in simulated randomized phase II NSCLC trials. Methods: We evaluated the utility of Serial CTRS using data from the randomized phase III trial of cetuximab plus carboplatin/paclitaxel with or without bevacizumab in advanced NSCLC, which did not meet its co-primary endpoints of PFS in patients with EGFR FISH-positive cancer and OS in the entire study population (SWOG S0819; N=1275). Serial CTRS is a convolutional-neural-network pipeline, trained on a large real-world advanced NSCLC dataset, using paired baseline and follow-up thoracic CT scans to generate a continuous imaging score without manual annotation. To quantify OS surrogacy, we repeatedly sampled 1000 pairs of random 50-patient arms from the full cohort, and correlated Serial CTRS differences at 8, 16, and 24 weeks with final OS hazard ratios (HR), comparing results with best overall response (BOR) and PFS. To simulate a positive phase II trial, we constructed a balanced subset (target OS HR≈0.50) using stratified pruning matched on randomization factors. We then simulated 1000 two-arm phase II trials (n=50/arm) with realistic staggered enrollment (averaging 1 patient/day) and interim analyses (IA) at 12-48 weeks from study start. PFS was evaluated via log-rank tests and Serial CTRS differences via Wilcoxon rank-sum tests (α=0.05). False-positive rates were evaluated through null simulations using the full dataset. Results: Serial CTRS differences showed increasing concordance with OS HR across timepoints (R2=0.10, 0.23, 0.35 at 8, 16, and 24 weeks), outperforming BOR (R2 = 0.08) and PFS (R2=0.09, 0.20, 0.28). In the simulated phase II trials, the biomarker achieved 60% (95% CI 58-62%) power and 66% (63-69%) power at 36 weeks to detect a long-term survival benefit while maintaining a 5-6% false-positive rate. BOR achieved 35% (33-37%) power, and PFS achieved 49% (46-51%) and 50% (48-52%) at the same timepoints. Conclusions: A fully automated deep-learning CT biomarker provided earlier and more reliable efficacy readouts than BOR and PFS in simulated phase II NSCLC trials. These results suggest that quantitative CT biomarkers using the full thoracic scan can strengthen early drug-development decisions by improving power and reducing uncertainty around early activity signals. Ongoing work is focused on broader evaluation across tumor types, therapeutic modalities, and additional clinical datasets. Citation Format: Chiharu Sako, Brenda F. Kurland, Taly G. Schmidt, Dwight H. Owen, Arpan A. Patel, Nicholas C. Love, Olivier Gevaert, George R. Simon, Ravi B. Parikh, Petr Jordan. Deep-learning CT biomarker improves early efficacy detection in simulated randomized phase II NSCLC trials [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 2775.

  • The ADAPT learning cancer treatment system: ARPA-H’s initiative to revolutionize cancer therapy

    Cancer Cell · 2026-01-08

    articleOpen access
  • SAGE-FM: A lightweight and interpretable spatial transcriptomics foundation model.

    PubMed · 2026-01-21

    articleSenior author

    perturbation experiments further show that the model captures directional ligand-receptor and upstream-downstream regulatory effects consistent with ground truth. These results demonstrate that simple, parameter-efficient GCNs can serve as biologically interpretable and spatially aware foundation models for large-scale spatial transcriptomics.

  • SAGE-FM: A lightweight and interpretable spatial transcriptomics foundation model

    Europe PMC (PubMed Central) · 2026-01-21

    preprintOpen accessSenior author

    Spatial transcriptomics enables spatial gene expression profiling, motivating computational models that capture spatially conditioned regulatory relationships. We introduce SAGE-FM, a lightweight spatial transcriptomics foundation model based on graph convolutional networks (GCNs) trained with a masked central spot prediction objective. Trained on 416 human Visium samples spanning 15 organs, SAGE-FM learns spatially coherent embeddings that robustly recover masked genes, with 91% of masked genes showing significant correlations (p &lt; 0.05). The embeddings generated by SAGE-FM outperform MOFA and existing spatial transcriptomics methods in unsupervised clustering and preservation of biological heterogeneity. SAGE-FM generalizes to downstream tasks, enabling 81% accuracy in pathologist-defined spot annotation in oropharyngeal squamous cell carcinoma and improving glioblastoma subtype prediction relative to MOFA. In silico perturbation experiments further demonstrate that the model captures directional ligand-receptor and upstream-downstream regulatory effects consistent with ground truth. These results demonstrate that simple, parameter-efficient GCNs can serve as biologically interpretable and spatially aware foundation models for large-scale spatial transcriptomics.

Recent grants

Frequent coauthors

  • Daniel T. Chang

    1211 shared
  • Sylvia K. Plevritis

    950 shared
  • Gary K. Steinberg

    Stanford Medicine

    901 shared
  • Erik P. Sulman

    New York University

    900 shared
  • Lih‐Shen Chin

    Shanghai University of Traditional Chinese Medicine

    900 shared
  • N. Saito

    900 shared
  • Kelsey Hopkins

    Purdue University West Lafayette

    900 shared
  • Ivan Smirnov

    University of California, San Francisco

    900 shared

Education

  • Ph.D., Biomedical Informatics

    Stanford University

    2015
  • M.S., Biomedical Informatics

    Stanford University

    2011
  • B.S., Computer Science

    University of Ghent

    2007
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Olivier Gevaert

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup