Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Cynthia D. Rudin

Cynthia D. Rudin

· Gilbert, Louis, and Edward Lehrman Distinguished ProfessorVerified

Duke University · Computer Science

Active 2003–2026

h-index56
Citations18.8k
Papers419177 last 5y
Funding$3.4M2 active
See your match with Cynthia D. Rudin — sign in to PhdFit.Sign in

About

Cynthia D. Rudin is a professor of computer science, electrical and computer engineering, statistical science, and biostatistics & bioinformatics at Duke University. She directs the Interpretable Machine Learning Lab and holds the Gilbert, Louis, and Edward Lehrman Distinguished Professorship. Her academic background includes an undergraduate degree from the University at Buffalo and a PhD from Princeton University. She has previously held positions at MIT, Columbia, and NYU. Her research focuses on artificial intelligence, machine learning, and data science, with an emphasis on interpretability and practical applications. Rudin has received numerous awards, including the 2022 Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from the AAAI, which is comparable to the Nobel Prize and the Turing Award. She is a three-time winner of the INFORMS Innovative Applications in Analytics Award and has been recognized as one of the 'Top 40 Under 40' by Poets and Quants and one of the most impressive professors at MIT by Businessinsider.com. She is a fellow of the American Statistical Association and the Institute of Mathematical Statistics, and has served as chair of sections within INFORMS and the American Statistical Association. Rudin has served on committees for DARPA, the National Institute of Justice, AAAI, ACM SIGKDD, and three committees for the National Academies of Sciences, Engineering, and Medicine. She has delivered keynote and invited talks at major conferences such as KDD, AISTATS, and the Nobel Conference. Her work has been featured in prominent news outlets including the NY Times, Washington Post, Wall Street Journal, and NPR.

Research topics

  • Machine Learning
  • Artificial Intelligence
  • Computer Science
  • Mathematics
  • Data Mining
  • Theoretical computer science

Selected publications

  • AutoSchA: Automatic Hierarchical Music Representations via Multi-Relational Node Isolation

    Proceedings of the AAAI Conference on Artificial Intelligence · 2026-03-14

    articleOpen access

    Hierarchical representations provide powerful and principled approaches for analyzing many musical genres. Such representations have been broadly studied in music theory, for instance via Schenkerian analysis (SchA). Hierarchical music analyses, however, are highly cost-intensive; the analysis of a single piece of music requires a great deal of time and effort from trained experts. The representation of hierarchical analyses in a computer-readable format is also a further challenge. Given recent developments in hierarchical deep learning and increasing quantities of computer-readable data, there is great promise in extending such work for an automatic hierarchical representation framework. This paper thus introduces a novel approach, AutoSchA, which extends recent developments in graph neural networks (GNNs) for hierarchical music analysis. AutoSchA features three key contributions: 1) a new graph learning framework for hierarchical music representation, 2) a new graph pooling mechanism based on node isolation that directly optimizes learned pooling assignments, and 3) a state-of-the-art architecture that integrates such developments for automatic hierarchical music analysis. We show, in a suite of experiments, that AutoSchA performs comparably to human experts when analyzing Baroque fugue subjects.

  • NF-κB dependent gene expression and plasma IL-1β, TNFα and GCSF drive transcriptomic diversity and CD4:CD8 ratio in people with HIV on ART

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-02-14

    preprintOpen access

    Despite antiretroviral therapy (ART), people with HIV (PWH) on ART experience higher rates of morbidity and mortality vs. age-matched HIV negative controls, which may be driven by chronic inflammation due to persistent virus. We performed bulk RNA sequencing (RNA-seq) on peripheral CD4+ T cells, as well as quantified plasma immune marker levels from 154 PWH on ART to identify host immune signatures associated with immune recovery (CD4:CD8) and HIV persistence (cell-associated HIV DNA and RNA). Using a novel dimension reduction tool - Pairwise Controlled Manifold Approximation (PaCMAP), we defined three distinct participant transcriptomic clusters. We found that these three clusters were largely defined by differential expression of genes regulated by the transcription factor NF-κB. While clustering was not associated with HIV reservoir size, we observed an association with CD4:CD8 ratio, a marker of immune recovery and prognostic factor for mortality in PWH on ART. Furthermore, distinct patterns of plasma IL-1β, TNF-α and GCSF were also strongly associated with the clusters, suggesting that these immune markers play a key role in CD4+ T cell transcriptomic diversity and immune recovery in PWH on ART. These findings reveal novel subgroups of PWH on ART with distinct immunological characteristics, and define a transcriptional signature associated with clinically significant immune parameters for PWH. A deeper understanding of these subgroups could advance clinical strategies to treat HIV-associated immune dysfunction.

  • Replication Data for: Matching Bounds: How Choice of Matching Algorithm Impacts Treatment Effects Estimates and What to Do About It.

    Harvard Dataverse · 2025-10-22

    datasetOpen accessSenior author

    Many major works in social science employ matching to make causal conclusions, but different matches on the same data may produce different treatment effect esti- mates, even when they achieve similar balance or minimize the same loss function. We discuss reasons and consequences of this problem. We present evidence of this prob- lem by replicating ten papers that use matching and we find that different popular matching algorithms produce inconsistent results. We introduce Matching Bounds: a finite-sample, nonstochastic method that allows analysts to know whether a matched sample that produces different results with the same levels of balance and overall match quality could be obtained from their data. We apply Matching Bounds to a replication of two studies and show that in one case results are robust to this issue and in another they are not.

  • How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data

    Proceedings of the AAAI Conference on Artificial Intelligence · 2025-04-11

    articleOpen accessSenior author

    Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time. Only recently has fine-grained spatial and temporal data become available to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information. Leveraging this new resource, we use a variety of variable importance techniques to robustly identify the most informative predictors across multiple health outcomes. We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR) to analyze both local and global spatial dependencies of each variable on various health outcomes. Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation. Regional analyses reveal local variations with air pollution and solar radiation, with notable shifts during COVID. This comprehensive approach provides actionable insights for addressing health disparities, and advocates for the integration of interpretable machine learning in public health.

  • Matching Bounds: How Choice of Matching Algorithm Impacts Treatment Effects Estimates and What to Do About It.

    The Journal of Politics · 2025-12-08

    articleOpen accessSenior author

    Many major works in social science employ matching to make causal conclusions, but different matches on the same data may produce different treatment effect estimates, even when they achieve similar balance or minimize the same loss function. We discuss reasons and consequences of this problem. We present evidence of this problem by replicating ten papers that use matching and we find that different popular matching algorithms produce inconsistent results. We introduce Matching Bounds: a finite-sample, nonstochastic method that allows analysts to know whether a matched sample that produces different results with the same levels of balance and overall match quality could be obtained from their data. We apply Matching Bounds to a replication of two studies and show that in one case results are robust to this issue and in another they are not.

  • NodMAISI: Nodule-Oriented Medical AI for Synthetic Imaging

    ArXiv.org · 2025-12-19

    articleOpen access

    Objective: Although medical imaging datasets are increasingly available, abnormal and annotation-intensive findings critical to lung cancer screening, particularly small pulmonary nodules, remain underrepresented and inconsistently curated. Methods: We introduce NodMAISI, an anatomically constrained, nodule-oriented CT synthesis and augmentation framework trained on a unified multi-source cohort (7,042 patients, 8,841 CTs, 14,444 nodules). The framework integrates: (i) a standardized curation and annotation pipeline linking each CT with organ masks and nodule-level annotations, (ii) a ControlNet-conditioned rectified-flow generator built on MAISI-v2's foundational blocks to enforce anatomy- and lesion-consistent synthesis, and (iii) lesion-aware augmentation that perturbs nodule masks (controlled shrinkage) while preserving surrounding anatomy to generate paired CT variants. Results: Across six public test datasets, NodMAISI improved distributional fidelity relative to MAISI-v2 (real-to-synthetic FID range 1.18 to 2.99 vs 1.69 to 5.21). In lesion detectability analysis using a MONAI nodule detector, NodMAISI substantially increased average sensitivity and more closely matched clinical scans (IMD-CT: 0.69 vs 0.39; DLCS24: 0.63 vs 0.20), with the largest gains for sub-centimeter nodules where MAISI-v2 frequently failed to reproduce the conditioned lesion. In downstream nodule-level malignancy classification trained on LUNA25 and externally evaluated on LUNA16, LNDbv4, and DLCS24, NodMAISI augmentation improved AUC by 0.07 to 0.11 at <=20% clinical data and by 0.12 to 0.21 at 10%, consistently narrowing the performance gap under data scarcity.

  • Graph-based design of irregular metamaterials

    International Journal of Mechanical Sciences · 2025-04-13 · 4 citations

    article
  • NodMAISI: Nodule-Oriented Medical AI for Synthetic Imaging

    arXiv (Cornell University) · 2025-12-19

    preprintOpen access

    Objective: Although medical imaging datasets are increasingly available, abnormal and annotation-intensive findings critical to lung cancer screening, particularly small pulmonary nodules, remain underrepresented and inconsistently curated. Methods: We introduce NodMAISI, an anatomically constrained, nodule-oriented CT synthesis and augmentation framework trained on a unified multi-source cohort (7,042 patients, 8,841 CTs, 14,444 nodules). The framework integrates: (i) a standardized curation and annotation pipeline linking each CT with organ masks and nodule-level annotations, (ii) a ControlNet-conditioned rectified-flow generator built on MAISI-v2's foundational blocks to enforce anatomy- and lesion-consistent synthesis, and (iii) lesion-aware augmentation that perturbs nodule masks (controlled shrinkage) while preserving surrounding anatomy to generate paired CT variants. Results: Across six public test datasets, NodMAISI improved distributional fidelity relative to MAISI-v2 (real-to-synthetic FID range 1.18 to 2.99 vs 1.69 to 5.21). In lesion detectability analysis using a MONAI nodule detector, NodMAISI substantially increased average sensitivity and more closely matched clinical scans (IMD-CT: 0.69 vs 0.39; DLCS24: 0.63 vs 0.20), with the largest gains for sub-centimeter nodules where MAISI-v2 frequently failed to reproduce the conditioned lesion. In downstream nodule-level malignancy classification trained on LUNA25 and externally evaluated on LUNA16, LNDbv4, and DLCS24, NodMAISI augmentation improved AUC by 0.07 to 0.11 at &lt;=20% clinical data and by 0.12 to 0.21 at 10%, consistently narrowing the performance gap under data scarcity.

  • Doctor Rashomon and the UNIVERSE of Madness: Variable Importance with Unobserved Confounding and the Rashomon Effect

    ArXiv.org · 2025-10-14

    preprintOpen accessSenior author

    Variable importance (VI) methods are often used for hypothesis generation, feature selection, and scientific validation. In the standard VI pipeline, an analyst estimates VI for a single predictive model with only the observed features. However, the importance of a feature depends heavily on which other variables are included in the model, and essential variables are often omitted from observational datasets. Moreover, the VI estimated for one model is often not the same as the VI estimated for another equally-good model - a phenomenon known as the Rashomon Effect. We address these gaps by introducing UNobservables and Inference for Variable importancE using Rashomon SEts (UNIVERSE). Our approach adapts Rashomon sets - the sets of near-optimal models in a dataset - to produce bounds on the true VI even with missing features. We theoretically guarantee the robustness of our approach, show strong performance on semi-synthetic simulations, and demonstrate its utility in a credit risk task.

  • How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data

    arXiv (Cornell University) · 2025-01-03

    preprintOpen accessSenior author

    Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time. Only recently has fine-grained spatial and temporal data become available to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information. Leveraging this new resource, we use a variety of variable importance techniques to robustly identify the most informative predictors across multiple health outcomes. We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR) to analyze both local and global spatial dependencies of each variable on various health outcomes. Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation. Regional analyses reveal local variations with air pollution and solar radiation, with notable shifts during COVID. This comprehensive approach provides actionable insights for addressing health disparities, and advocates for the integration of interpretable machine learning in public health.

Recent grants

Frequent coauthors

  • Alina Jade Barnett

    Duke University

    38 shared
  • Alexander Volfovsky

    Duke University

    32 shared
  • M. Brandon Westover

    Harvard University

    31 shared
  • Margo Seltzer

    28 shared
  • Wendong Ge

    Beth Israel Deaconess Medical Center

    24 shared
  • Edward P. Browne

    University of North Carolina at Chapel Hill

    23 shared
  • Chaofan Chen

    Southeast University

    23 shared
  • Lesia Semenova

    Duke University

    22 shared

Awards & honors

  • 2022 Squirrel AI Award for Artificial Intelligence for the B…
  • Three-time winner of the INFORMS Innovative Applications in…
  • Fellow of the American Statistical Association
  • Fellow of the Institute of Mathematical Statistics
  • Named as one of the "Top 40 Under 40" by Poets and Quants in…
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Cynthia D. Rudin

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup