Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Molly A Hall

Molly A Hall

Verified

University of Pennsylvania · Rehabilitation Medicine

Active 1999–2026

h-index15
Citations762
Papers5527 last 5y
Funding$9k
See your match with Molly A Hall — sign in to PhdFit.Sign in

About

Molly A Hall, Ph.D, is a Research Assistant Professor of Genetics at the University of Pennsylvania's Perelman School of Medicine. She is part of the Department of Genetics and is based in the Richards Building in Philadelphia. Dr. Hall's research expertise includes human genetics, exposome, gene-environment interactions, complex disease, childhood maltreatment and trauma, multi-omic data integration, post-traumatic stress disorder, autism spectrum disorder, attention-deficit/hyperactivity disorder, adverse childhood experiences, and Alzheimer's disease. Her work focuses on understanding the biological and environmental factors contributing to various health conditions, leveraging multi-omic approaches and environmental data to elucidate complex disease mechanisms.

Research topics

  • Computer Science
  • Internal medicine
  • Data science
  • Medicine
  • World Wide Web
  • Virology

Selected publications

  • Exposure to a low dose mixture of endocrine disrupting chemicals alters the brain transcriptome and animal behavior

    bioRxiv (Cold Spring Harbor Laboratory) · 2026-02-11

    articleOpen access

    Abstract Exposures to pervasive chemical toxicants such as endocrine disrupting chemicals (EDCs) are associated with adverse neurological and neurodevelopmental deficits. Although EDCs are widespread as sparse mixtures in the environment, most research has focused on single chemicals at high concentrations. Here, we studied the effects of ldEDC: a low-dose mixture of widely prevalent toxicants at doses representative of normal human exposure levels. Primary cultured mouse neurons treated with ldEDC exhibited altered gene expression compared to vehicle controls in genes critical for neuron activity, indicating low doses EDCs can affect neuronal function directly. We next tested persistent exposure through the maternal diet to define perinatal effects on offspring. Exposed offspring exhibited differences in development, tactile sensitivity, and sex-specific changes in motor behavior. Cortical single-nuclei sequencing identified broad transcriptomic changes, particularly in distinct cortical layer subpopulations, excitatory neurons, and astrocytes. Cell-cell signaling between neurons and non-neuronal populations were altered in exposed mice, specifically in pathways associated with cellular adhesion. Transcriptomic differences were also sex-specific. Together, these in vitro and in vivo findings reveal molecular and phenotypic consequences of EDC exposure at a mixture of doses well below commonly studied levels and highlights common functional pathways of susceptibility.

  • Phenomic environment-wide association study (PheEWAS) models complexity in the exposome

    medRxiv · 2025-06-14 · 1 citations

    preprintOpen accessSenior authorCorresponding

    Phenome-wide association studies (PheWAS) have successfully identified genomic-based interrelationships between phenotypes but seldom consider environmental exposures. Here, in a phenomic environment-wide association study (PheEWAS), we interrogated relationships between 326 exposures and 55 phenotypes for ∼19,000 participants of the National Health and Nutrition Examination Survey (NHANES). Linear regression models adjusted for age, sex, socioeconomic status, BMI, race/ethnicity, and survey year identified and replicated 106 significant exposure–phenotype associations after Bonferroni correction. The top association was for alpha-tocopherol (vitamin E) with triglycerides (Discovery p = 1.16 × 10⁻¹¹; Replication p = 8.05 × 10⁻¹³). The exposure retinol (vitamin A) had the largest number of individual replicating associations (14 phenotypes including total calcium, iron-binding capacity, ferritin, albumin, transferrin saturation, creatinine, gamma-glutamyl transferase, triglycerides, uric acid, alkaline phosphatase, hemoglobin, and blood urea nitrogen). The phenotype with the greatest number of exposure associations was homocysteine (associated with thiamine; alpha- and gamma-tocopherol; dietary fiber, protein, and potassium; riboflavin; cotinine; folate; phosphorus; cadmium; iron intake; supplement count; and niacin). A race/ethnicity-stratified analysis revealed 11 unique population-specific associations. Our findings demonstrate PheEWAS a method to provide new details on the complexity of the exposome at the level of the phenome

  • Neuron-derived circulating miRNAs reveal lead (Pb) as a key component of metal mixtures exposure

    medRxiv · 2025-02-12 · 1 citations

    preprintOpen access

    Abstract Environmental exposures rarely occur in isolation, yet biomarkers capturing early brain responses to complex metal mixtures remain limited. Neuron-derived miRNAs detectable in peripheral blood may provide minimally invasive indicators of brain-related molecular processes. We profiled miRNAs from neuron-derived extracellular vesicles in 66 adults and identified 50 dysregulated miRNAs. Among these, miR-16-5p, miR-93-5p, and miR-486-5p were reduced in individuals with higher exposure levels. Metal mixture models identified Pb as the metal most consistently associated with these miRNAs. To explore the translational relevance of these findings, we integrated brain MRI measures and observed that mediation analyses suggested miR-16-5p may represent a potential pathway linking Pb exposure to iron-sensitive MRI signals (R2*, a marker of brain iron) in the red nucleus. Together, these results suggest circulating neuron-derived miRNAs may capture molecular signatures linking complex metal mixtures, with Pb as a key component, to neuronal regulatory pathways and early brain-related perturbation to real-world exposures.

  • Itaconate promotes the differentiation of murine stress erythroid progenitors by increasing Nrf2 activity

    Blood Red Cells & Iron · 2025-02-27 · 2 citations

    articleOpen access

    • Anti-inflammatory signals promote the transition to differentiation of SEPs. • The metabolite itaconate increases nuclear factor erythroid 2–related factor 2 activity to promote the differentiation of SEPs. Steady-state erythropoiesis produces new erythrocytes at a constant rate to replace senescent erythrocytes removed in the spleen and liver. Inflammation caused by infection or tissue damage skews bone marrow hematopoiesis, increasing myelopoiesis at the expense of steady-state erythropoiesis. To compensate for the loss of production, stress erythropoiesis is induced. Stress erythropoiesis is highly conserved between mice and humans. It uses a strategy different to the constant production of steady-state erythropoiesis. Inflammatory signals promote the proliferation of immature stress erythroid progenitors (SEPs), which then commit to differentiation. This transition relies on signals made by niche macrophages in response to erythropoietin. Nitric oxide–dependent signaling drives the proliferation of SEPs, and nitric oxide production must be decreased so that progenitor cells can differentiate. Here, we show that as progenitor cells transition to differentiation, increased production of the anti-inflammatory metabolite itaconate activates nuclear factor erythroid 2–related factor 2, which decreases nitric oxide synthase 2 expression, leading to decreased nitric oxide production. Mutation of immunoresponsive gene 1, the enzyme that catalyzes the production of itaconate, causes a delayed recovery from inflammatory anemia induced by heat-killed Brucella abortus . These data show that the differentiation of SEPs relies on a switch to an anti-inflammatory metabolism and increased expression of proresolving cytokines.

  • DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs

    medRxiv · 2025-08-21

    preprintOpen access

    Multi-omics data are instrumental in obtaining a comprehensive picture of complex biological systems. This is particularly useful for women's health conditions, such as endometriosis which has been historically understudied despite having a high prevalence (around 10% of women of reproductive age). Subsequently, endometriosis has limited genetic characterization: current genome-wide association studies explain only 11% of its 47% total estimated heritability. Graph representations provide an intuitive and meaningful way to relate concepts across diverse data sources and address fundamental sparsity and dimensionality challenges with multi-omics data analysis. Here we present DRIVE-KG (Disease Risk Inference and Variant Exploration-Knowledge Graph), which uses a heterogeneous graph representation to integrate biological data from multi-omics datasets: dbSNP, NCBI Human Gene, Omics Pred, GTEx, and Open Targets. We drew directly from the knowledge captured in these data, using nodes to represent genes, single nucleotide polymorphisms, proteins, and phenotypes, and edges to represent relationships between these concepts. We trained two models using DRIVE-KG: a link prediction model to suggest associations between SNPs and two pilot phenotypes (endometriosis and obesity), and a graph convolutional network (GCN) to classify patient-level endometriosis status. We conducted the patient-level classification using data from 1,441 Penn Medicine BioBank participants with gold standard chart-reviewed endometriosis status. The link prediction model uncovered 66 high-confidence (score ≥ 0.95) previously unreported SNP-endometriosis associations. Many of these variants were linked to obesity/body mass index traits (24.2%), lipid metabolism (6%), and depressive disorders (4.5%), showing agreement with emerging hypotheses about endometriosis etiology. In contrast, 11% of the 149 high confidence, candidate SNP-obesity associations (score ≥ 0.9888) were in LD with known obesity associations. The GCN to classify patient endometriosis status had an AUPRC of 0.738 compared to 0.679 for a genetic risk score. Despite this moderate improvement, we found that the GCN learned meaningful stratification of underlying adenomyosis signal and severe grades of endometriosis. We have demonstrated that heterogeneous integration of multi-omics data is valuable for diverse downstream tasks-including discovery and clinical prediction-particularly for understudied diseases where traditional genomic approaches are insufficient.

  • DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs

    2025-12-01

    articleOpen access

    Multi-omics data are instrumental in obtaining a comprehensive picture of complex biological systems. This is particularly useful for women's health conditions such as endometriosis, which has been historically understudied despite having a high prevalence (around 10% of women of reproductive age). Subsequently, endometriosis has limited genetic characterization: current genome-wide association studies explain only 11% of its 47% total estimated heritability, underscoring the need for integrative approaches. Graph representations provide an intuitive and meaningful way to harmonize biological data, using nodes to represent biological concepts (e.g., genes, single nucleotide polymorphisms, proteins, and phenotypes) and edges to represent their relationships. We present DRIVE-KG (Disease Risk Inference and Variant Exploration Knowledge Graph), which uses a heterogeneous graph representation to integrate data from diverse multi-omics datasets. We trained two distinct models using DRIVE-KG: a link prediction model to suggest associations between SNPs and two pilot phenotypes (endometriosis and obesity), and a graph convolutional network (GCN) for patient-level classification of endometriosis/adenomyosis as a combined phenotype. We conducted patient-level classification using data from 1,441 Penn Medicine BioBank participants with gold standard chart-reviewed endometriosis/adenomyosis status. The link prediction model uncovered 66 high-confidence (model score ≥ 0.95) candidate SNP-endometriosis associations, representing largely distinct genetic signals (R2 < 0.1). These variants were enriched for obesity/body mass index traits (24.2%), lipid metabolism (6%), and depressive disorders (4.5%), showing agreement with emerging hypotheses about endometriosis etiology. In contrast, of the high-confidence, candidate SNP-obesity associations that could be evaluated using LDlink, 38.22% were in high linkage disequilibrium (R2 ≥ 0.8) with known obesity or comorbidity associations. The GCN to classify patient endometriosis/adenomyosis status had an F1 score of 0.752 compared to 0.698 for a genetic risk score. Despite this moderate improvement, we found that the GCN learned meaningful stratification of underlying adenomyosis signal and severe endometriosis grades. Together, these results demonstrate that heterogeneous integration of multi-omics data is valuable for diverse downstream tasks-including discovery and clinical prediction-particularly for understudied diseases where traditional genomic approaches are insufficient.

  • Importance of genetic ancestry in pharmacogenomics for precision medicine

    Pharmacogenomics · 2025-12-12

    articleOpen access

    Genetic ancestry refers to an individual's biogeographical origins inferred from correlated allele frequencies shared with individuals from similar ancestral regions. Understanding the complexities of genetic ancestry has proven beneficial in the field of pharmacogenomics (PGx), where personalized medication regimens are optimizing therapeutic outcomes while minimizing the risk of side effects. With the rise in the availability of electronic health records (EHR), population-specific genetic data can be integrated with clinical data using machine learning approaches to improve personalized treatment plans. Furthermore, multiomics data such as the transcriptome, methylome, proteome, and metabolome, paired with advances in machine learning methods, provide a more comprehensive approach to understanding genetic variation. The expansion of PGx studies in diverse populations can broaden the impact of precision medicine, particularly among underrepresented groups.

  • Integrated exposomic analysis of lipid phenotypes: Leveraging GE.db in environment by environment interaction studies

    2024-11-21 · 1 citations

    articleOpen accessSenior author

    Gene-environment interaction (GxE) studies provide insights into the interplay between genetics and the environment but often overlook multiple environmental factors' synergistic effects. This study encompasses the use of environment by environment interaction (ExE) studies to explore interactions among environmental factors affecting lipid phenotypes (e.g., HDL, LDL, and total cholesterol, and triglycerides), which are crucial for disease risk assessment. We developed a novel curated knowledge base, GE.db, integrating genomic and exposomic interactions. In this study, we filtered NHANES exposure variables (available 1999-2018) to identify significant ExE using GE.db. From 101,316 participants and 77 exposures, we identified 263 statistically significant interactions (FDR p < 0.1) in discovery and replication datasets, with 21 interactions significant for HDL-C (Bonferroni p < 0.05). Notable interactions included docosapentaenoic acid (22:5n-3) (DPA) - arachidic acid (20:0), stearic acid (18:0) - arachidic acid (20:0), and blood 2,5-dimethyfuran - blood benzene associated with HDL-C levels. These findings underscore GE.db's role in enhancing -omics research efficiency and highlight the complex impact of environmental exposures on lipid metabolism, informing future health strategies.

  • Differential effects of environmental exposures on clinically relevant endophenotypes between sexes

    Scientific Reports · 2024-09-13 · 1 citations

    articleOpen accessSenior author

    Sex and gender differences play a crucial role in health and disease outcomes. This study used data from the National Health and Nutrition Examination Survey to explore how environmental exposures affect health-related traits differently in males and females. We utilized a sex-stratified phenomic environment-wide association study (PheEWAS), which allowed the identification of associations across a wide range of phenotypes and environmental exposures. We examined associations between 272 environmental exposures, including smoking-related exposures such as cotinine levels and smoking habits, and 58 clinically relevant blood phenotypes, such as serum albumin and homocysteine levels. Our analysis identified 119 sex-specific associations. For example, smoking-related exposures had a stronger impact on increasing homocysteine, hemoglobin, and hematocrit levels in females while reducing serum albumin and bilirubin levels and increasing c-reactive protein levels more significantly in males. These findings suggest mechanisms by which smoking exposure may pose higher cardiovascular risks and greater induced hypoxia for women, and greater inflammatory and immune responses in men. The results highlight the importance of considering sex differences in biomedical research. Understanding these differences can help develop more personalized and effective health interventions and improve clinical outcomes for both men and women.

  • Longitudinal method comparison: modeling polygenic risk for post-traumatic stress disorder over time in individuals of African and European ancestry

    Frontiers in Genetics · 2024-05-16 · 1 citations

    articleOpen accessSenior authorCorresponding

    Cross-sectional data allow the investigation of how genetics influence health at a single time point, but to understand how the genome impacts phenotype development, one must use repeated measures data. Ignoring the dependency inherent in repeated measures can exacerbate false positives and requires the utilization of methods other than general or generalized linear models. Many methods can accommodate longitudinal data, including the commonly used linear mixed model and generalized estimating equation, as well as the less popular fixed-effects model, cluster-robust standard error adjustment, and aggregate regression. We simulated longitudinal data and applied these five methods alongside naïve linear regression, which ignored the dependency and served as a baseline, to compare their power, false positive rate, estimation accuracy, and precision. The results showed that the naïve linear regression and fixed-effects models incurred high false positive rates when analyzing a predictor that is fixed over time, making them unviable for studying time-invariant genetic effects. The linear mixed models maintained low false positive rates and unbiased estimation. The generalized estimating equation was similar to the former in terms of power and estimation, but it had increased false positives when the sample size was low, as did cluster-robust standard error adjustment. Aggregate regression produced biased estimates when predictor effects varied over time. To show how the method choice affects downstream results, we performed longitudinal analyses in an adolescent cohort of African and European ancestry. We examined how developing post-traumatic stress symptoms were predicted by polygenic risk, traumatic events, exposure to sexual abuse, and income using four approaches—linear mixed models, generalized estimating equations, cluster-robust standard error adjustment, and aggregate regression. While the directions of effect were generally consistent, coefficient magnitudes and statistical significance differed across methods. Our in-depth comparison of longitudinal methods showed that linear mixed models and generalized estimating equations were applicable in most scenarios requiring longitudinal modeling, but no approach produced identical results even if fit to the same data. Since result discrepancies can result from methodological choices, it is crucial that researchers determine their model a priori , refrain from testing multiple approaches to obtain favorable results, and utilize as similar as possible methods when seeking to replicate results.

Recent grants

Frequent coauthors

Labs

  • The Hall LabPI

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Molly A Hall

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup