
Marylyn D Ritchie
· Ph.D.VerifiedUniversity of Pennsylvania · Rehabilitation Medicine
Active 1993–2026
About
Marylyn D Ritchie, PhD, is an Adjunct Professor of Genetics at the University of Pennsylvania's Perelman School of Medicine. She serves as the Director of the Institute for Biomedical Informatics and is the Vice President for Research Informatics at the University of Pennsylvania Health System. Additionally, she is the Director of the Division of Informatics within the Department of Biostatistics, Epidemiology, and Informatics at the same institution. Her research expertise encompasses computational genomics, bioinformatics, epistasis, pharmacogenomics, big data, evolutionary computation, genetic epidemiology, statistical genetics, systems genomics, and translational informatics, with a focus on cardiovascular disease. Dr. Ritchie's work involves applying advanced computational and statistical methods to understand genetic and molecular mechanisms underlying complex diseases, contributing significantly to the fields of biomedical informatics and genomics.
Research topics
- Genetics
- Biology
- Medicine
- Evolutionary biology
- Internal medicine
- Computer Science
- Political Science
- Bioinformatics
- Computational biology
- Data science
- Pathology
- Endocrinology
- Virology
- Environmental health
- Psychiatry
- Medical emergency
- Demography
- Cardiology
- Surgery
- Immunology
- Clinical psychology
Selected publications
Nature Medicine · 2026-04-15
articleOpen accessAbstract Individuals of African ancestry carrying APOL1 (apolipoprotein L1) high-risk genotypes face a markedly increased risk of kidney failure, yet tools to identify those individuals likely to progress to chronic kidney disease are lacking. Here we profiled plasma proteomes of 851 Penn Medicine BioBank participants of African ancestry (285 males and 566 females) with APOL1 high-risk genotypes and preserved estimated glomerular filtration rate (eGFR) (≥60 ml min −1 1.73 m −2 ). Using elastic net Cox regression adjusted for age, sex, eGFR and albuminuria, we derived a nine-protein APOL1 Proteomic Risk Score (APRS) that predicts a composite outcome of ≥40% eGFR decline, kidney failure or death. APRS achieved a time-dependent area under the receiver operating characteristic curve (tAUC) of 86.5%, outperforming the Kidney Failure Risk Equation (66.1%) and polygenic risk scores, with 10-year event rates of 62.5% versus 3.3% across risk quintiles. External validation in Atherosclerosis Risk in Communities and UK Biobank cohorts confirmed robust accuracy (tAUC 82–85%) and consistent performance across demographic and clinical subgroups. Plasma levels of APRS component proteins correlated with kidney tissue fibrosis and tubular injury pathways, indicating strong biological plausibility. By enabling early and accurate prediction of disease progression in APOL1 high-risk individuals, APRS bridges the gap between genetic susceptibility and clinical translation. This scalable and biologically informed approach provides a precision medicine framework for early intervention and may accelerate development of APOL1-targeted therapies to reduce kidney disease disparities.
Enabling Few-Shot Alzheimer's Disease Diagnosis on Biomarker Data with Tabular LLMs
ArXiv.org · 2025-07-31
preprintOpen accessEarly and accurate diagnosis of Alzheimer's disease (AD), a complex neurodegenerative disorder, requires analysis of heterogeneous biomarkers (e.g., neuroimaging, genetic risk factors, cognitive tests, and cerebrospinal fluid proteins) typically represented in a tabular format. With flexible few-shot reasoning, multimodal integration, and natural-language-based interpretability, large language models (LLMs) offer unprecedented opportunities for prediction with structured biomedical data. We propose a novel framework called TAP-GPT, Tabular Alzheimer's Prediction GPT, that adapts TableGPT2, a multimodal tabular-specialized LLM originally developed for business intelligence tasks, for AD diagnosis using structured biomarker data with small sample sizes. Our approach constructs few-shot tabular prompts using in-context learning examples from structured biomedical data and finetunes TableGPT2 using the parameter-efficient qLoRA adaption for a clinical binary classification task of AD or cognitively normal (CN). The TAP-GPT framework harnesses the powerful tabular understanding ability of TableGPT2 and the encoded prior knowledge of LLMs to outperform more advanced general-purpose LLMs and a tabular foundation model (TFM) developed for prediction tasks. To our knowledge, this is the first application of LLMs to the prediction task using tabular biomarker data, paving the way for future LLM-driven multi-agent frameworks in biomedical informatics.
medRxiv · 2025-06-02 · 1 citations
preprintOpen accessSenior authorCorrespondingAlzheimer's Disease (AD) is the most prevalent condition that impacts the aging population, with no effective treatment or singular underlying causal factor identified. As a complex disease, characterizing the genetic risk of developing AD has proven to be difficult; polygenic scores (PGS) exclusively use common variants which fail to fully capture disease heterogeneity. This study used univariate and multivariate approaches to characterize AD risk. Genome-, transcriptome-, and proteome-wide association studies (GWAS, TWAS, and PWAS) were conducted on 15,480 individuals from the Alzheimer's Disease Sequencing Project (ADSP) R4 release to identify AD-associated signals, followed by pathway enrichment analysis. Integrative risk models (IRMs) were developed using genetically-regulated components of gene and protein expression and clinical covariates. Elastic-net logistic regression and random forest classifiers were evaluated using area under the receiver operating characteristic (AUROC), area under the precision-recall curve (AUPRC), F1-score, and balanced accuracy. These IRMs were compared against baseline PGS and covariate models. We identified 104 genomic, 319 transcriptomic, and 17 proteomic associations with AD under significant thresholds. Putatively novel associations were enriched in signaling, myeloid differentiation, and immune pathways. The best-performing IRM, random forest with transcriptomic and covariate features, achieved an AUROC of 0.703 and AUPRC of 0.622, significantly outperforming PGS and baseline models. Integrating univariate discovery approaches with multivariate modeling enhances AD risk prediction and offers insights into underlying biological processes.
Genetic polymorphisms and adverse reactions to antituberculosis therapy
Pharmacogenomics · 2025-04-13 · 2 citations
reviewOpen accessTuberculosis is the leading cause of death from a single infectious agent globally, with the highest burden in low-and middle-income countries. Successful treatment requires prolonged administration of multiple drugs. The increasing threat of multidrug-resistant tuberculosis has prompted the development of a robust pipeline for new drugs. While generally safe and well tolerated, adverse drug reactions (ADRs) to TB drugs have a considerable impact on treatment outcomes. Pharmacogenetic testing has been implemented for some diseases to identify at-risk individuals and prevent ADRs. For tuberculosis treatment, the use of pharmacogenetic testing to optimize complex regimens and avoid ADRs is appealing, but there has been minimal implementation. To improve the use of pharmacogenetics, understanding both the pharmacology of relevant drugs and population-specific pathophysiology of ADRs are essential. This review highlights the major treatment-limiting ADRs with TB drugs, the current understanding of drug metabolic pathways, ADR pathophysiology, and known pharmacogenetic risk alleles. We highlight research gaps and barriers to meaningful clinical use and implementation of pharmacogenomic testing to prevent adverse reactions to TB drugs.
Ascending Aortic Dimensions and Body Size
JACC. Cardiovascular imaging · 2025-08-22 · 2 citations
article2025-11-16
articleOpen accessSenior authorCorresponding<ns3:p> Heart failure (HF) is highly prevalent, high-burden disorder with its prevalence expected to increase. Early detection of HF can reduce morbidity and mortality; therefore, novel early detection methods are needed. Polygenic scores (PGS) can combine common variants across the genome and provide phenotype-specific risk scores. However, there are also many well-known, non-genomic risk factors of HF, in the clinical, lifestyle, and social determinant of health (SDOH) domains, and it is not clear how genetic and non-genetic risk factors collectively contribute to HF risk. To address this question, we assessed whether combining HF PGS with clinical, lifestyle, and SDOH risk factors improves risk prediction. Leveraging data from the <ns3:italic>All of Us</ns3:italic> Research Program (n = 22,275), clinical risk factors were aggregated into a clinical risk score (CRS) while lifestyle and SDOH risk factors were aggregated into a polyexposure score (PXS). Feature selection was conducted with LASSO regression and statistical significance thresholding from logistic regression models (p < 0.05). Features were included in the model if they were statistically significant and important in <ns3:italic>≥</ns3:italic> 95% of 1000 iterations. To assess model performance, logistic regressions with HF case/control status were conducted with each risk score individually, as well as integrated models. The integrated model (PGS + CRS + PXS) performed better than individual risk scores (AUROC = 0.763, AUPRC = 0.047, F1 score = 0.062, balanced accuracy = 0.683). To assess the validity of the CRS and PXS, an integrated model with the PGS along with clinical and exposure risk factors as independent features was also evaluated. Based on AUPRC and F1 score, this integrated risk model (PGS + CRS risk factors + PXS risk factors) performed better than the combining the PGS with the CRS and PXS (AUROC = 0.738, AUPRC = 0.047, F1 score = 0.066, balanced accuracy = 0.657). These findings demonstrate that integration of risk factors across multiple domains can improve HF prediction. Knowing that PGS combined with clinical, lifestyle, and SDOH risk factors is predictive of HF risk provides greater opportunity for the identification of individuals at risk of HF prior to disease onset with the goal of prevention or early intervention. </ns3:p>
A one-shot, lossless algorithm for cross-cohort learning in mixed-outcomes analysis
Patterns · 2025-07-30
articleOpen accessIn cross-cohort studies, integrating diverse datasets is essential and challenging due to cohort-specific variations, distributed data storage, and privacy concerns. Traditional methods often require data pooling or harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed electronic health record (EHR) datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,530 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.
Innovation in Aging · 2025-09-22 · 1 citations
articleOpen accessBackground and Objectives: Alzheimer's disease (AD) and AD-related dementias (ADRD) are expected to affect over 100 million people by 2050, placing a significant strain on public health systems. Social determinants of health (SDoH), which include factors such as socioeconomic conditions and environment, play a crucial role in AD risk. Despite growing evidence, the understanding of SDoH's impact on AD remains limited. Research Design and Methods: This study leverages large language models and knowledge graphs (KGs) to extract AD-related SDoH knowledge from literature and electronic health records (EHR). We integrate this knowledge into biological research on AD through KG construction and graph deep learning, performing KG-link predictions validated by multimodal biological data from single-cell RNA-seq and proteomics. Results: We generated an SDoH knowledge graph with around 92k triplets, integrating literature and EHR data. In various link prediction experiments, we observed higher accuracy when integrating SDoH into knowledge graphs. Additionally, exploratory predictions uncovered potential SDoH-gene interactions, many of which were validated through differential expression analysis using proteomics and RNA-seq data. Discussion and Implications: This novel KG-based analysis enhances link prediction in AD-related biomedical networks by integrating SDoH and biological knowledge. Our findings highlight the potential interaction between social determinants and biological factors in AD, offering insights into more personalized and socially aware healthcare interventions.
A loss-of-function missense variant in ANGPTL3 exerts protective effects against kidney disease risk
Atherosclerosis · 2025-07-18
articleOpen accessNature Genetics · 2025-11-28
articleOpen access
Recent grants
NIH · $27.5M · 2021
Methods for Enhancing Polygenic Risk Prediction Models for Complex Disease
NIH · $2.3M · 2023–2027
Postdoctoral Training Program in Genomic Medicine
NIH · $3.7M · 2017–2027
Artificial Intelligence Strategies for Alzheimer's Disease Research
NIH · $6.7M · 2021–2026
Penn State Biomedical Big Data to Knowledge (B2D2K) Training Program
NIH · $1.2M · 2016–2021
Frequent coauthors
- 307 shared
Anurag Verma
- 282 shared
Dana C. Crawford
Case Western Reserve University
- 268 shared
Dan M. Roden
Vanderbilt University
- 255 shared
Sarah A. Pendergrass
Geisinger Medical Center
- 224 shared
Yuki Bradford
University of Pennsylvania
- 216 shared
Gail P. Jarvik
Seattle University
- 207 shared
Jun Liu
University of California, San Francisco
- 203 shared
Joshua C. Denny
National Institutes of Health
Labs
Marylyn D Ritchie LabPI
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Marylyn D Ritchie
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup