
Wanding Zhou
VerifiedUniversity of Pennsylvania · Rehabilitation Medicine
Active 2011–2026
About
Wanding Zhou, Ph.D., is an Assistant Professor of Pathology and Laboratory Medicine at the University of Pennsylvania, with a secondary appointment at the Children's Hospital of Philadelphia. His research expertise centers on epigenetics, specifically the mitotic-inheritable chemical modifications of chromatin in eukaryotic cells, which instruct the interpretation of genetic information within the nucleus. His research aims to understand how cells translate epigenetic information into transcriptional regulation and phenotypical manifestation, with a focus on leveraging DNA methylation as a robust readout of chromatin state and cell identity. Dr. Zhou's work involves developing and applying DNA methylation-based methods to study human diseases, particularly pediatric malignancies, developmental abnormalities, cognitive deficits, and infectious diseases. His research seeks to uncover biomarkers and develop computational methods to support translational research, improve patient outcomes, and enhance quality of life. He develops informatics tools to support genomic technologies, including methods for versatile use of DNA methylation microarrays and sequencing data, as well as approaches for low-input and single-cell epigenetic assays. His work also integrates DNA methylation with other genomic and chromatin data to bridge technological advances with biological insights.
Research topics
- Biology
- Genetics
- Computational biology
- Evolutionary biology
- Cell biology
- Mathematics
- Virology
- Statistics
- Cancer research
Selected publications
Smoking drives an epigenetic memory of aberrant hematopoiesis
medRxiv · 2026-05-21
articleOpen accessAbstract Tobacco smoking induces DNA methylation (DNAm) changes in blood and other tissues, which may influence chronic health outcomes. However, the breadth of smoking-related DNAm changes remains unmapped, offering a space for employing novel technologies. To expand our understanding of smoking impacts on DNAm, we conducted an epigenome-wide association study (EWAS) comparing ever smokers to never smokers, using blood from a multiethnic U.S. study population (n=887). We employed the newly developed Illumina Methylation Screening Array (MSA) covering 269,094 unique sites, including 123,776 CpGs not assayed in previous EWAS. Trans-ethnic meta-analysis identified 152 differentially methylated positions (DMPs) associated with ever-smoking status (n=764); European-specific analysis yielded 129 DMPs (n=674), including 106 overlapping with trans-ethnic analysis. A separate, large-scale replication EWAS (n=2,190) confirmed 91 trans-ethnic and 77 European-specific DMPs. Among our findings, we identified 61 DMPs at CpGs novel to the MSA platform, including near both new and known smoking-associated genes. Most notably, we uncovered a dense cluster of 12 DMPs within a 1117 bp region of ECEL1P1 , forming the most long-lasting, persistent smoking-associated DMR ever detected, even among former smokers who quit decades prior. We also detected new signals at AHRR , a well-known locus for smoking-related DNAm changes. eFORGE analysis revealed that detected smoking-associated DNAm changes are predominantly located in hematopoietic stem and progenitor cell (HSPC) DNase I hotspots, aligning with gene set enrichment analyses that highlighted pathways related to hematopoietic stem cell differentiation. Our findings suggest that HSPCs serve as a reservoir for an epigenetic memory of smoking. Additionally, we observed short-term cell-specific smoking-associated DNAm changes in myeloid cells. Our results demonstrate the utility of the MSA in expanding our knowledge of both transient and persistent environmental exposure-associated DNAm changes. Highlights Applied the state-of-the-art Methylation Screening Array (MSA), 269,094 unique sites including 123,776 not studied previously, which were selected for likely functional relevance. Identified 61 novel smoking-associated differentially methylated positions (DMPs), annotated to novel genes as well as genes previously associated with smoking-related DNA methylation. Smoking-associated DMPs were enriched in regulatory elements of hematopoietic stem and progenitor cells (HSPCs, via eFORGE analyses and GSEA) and HSPC regulatory genes (e.g. RUNX1 ), implicating HSPCs as reservoirs of long-term epigenetic memory. 12 DMPs collectively form the most long-lasting, persistent smoking-associated differentially methylated region (DMR) detected so far, spanning a 1117 bp region at ECEL1P1 . Smoking drives two distinct classes of DNAm alterations: transient, myeloid-specific changes and persistent, cell-type-shared signatures originating in HSPCs, forming a dual-track model of smoking-induced epigenetic remodeling.
Dataset for "KnowYourCG: Facilitating base-level sparse methylome interpretation" --- HM450
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-22
datasetOpen access1st authorCorrespondingKnowledgebases for the HumanMethylation450 Array (HM450) This repository hosts curated knowledgebases for the Infinium HumanMethylation450 (HM450) array. These datasets are specifically formatted as RDS files for the knowYourCG Bioconductor package. They enable researchers to perform functional enrichment analysis on 450k data using updated genomic landmarks, chromatin states, and regulatory features. 1. Technical Metadata & Quality Control Essential datasets for array-level metadata and data cleaning. ProbeType — Infinium Type I vs Type II probe design. InfiniumChemistry — Technical chemistry metadata for the 450k platform. Mask (2026 Update) — Latest probe masking for artifacts and SNPs. Mask (Legacy, BioC default) — Previous version of quality masks. Blacklist — Genomic regions prone to mapping interference. 2. Genomic Context & Sequence Features Knowledgebases describing the physical and evolutionary landscape of the 450k probes. Chromosome — Updated chromosomal assignments. CGI — CpG Island (CGI) associations. nFlankCG — Nucleotide composition of sequences flanking the CpG. Tetranuc2 — Tetranucleotide frequency signatures. rmsk1 & rmsk2 — RepeatMasker repetitive elements. 3. Epigenomic States & Regulatory Elements Annotations linking 450k sites to functional chromatin and protein binding data. ChromHMM & REMCChromHMM — Chromatin state models and Roadmap Epigenomics updates. HM — Histone modification peak overlaps. TFBSrm — Transcription Factor Binding Sites. CTCFbind — CTCF binding/insulator sites. ABCompartment — Higher-order chromatin structure (A/B compartments). PMD — Partially Methylated Domains. 4. Biological Signatures Specialized datasets for tissue-specific and developmental biology. ImprintingDMR — Differentially Methylated Regions associated with imprinting. MetagenePC — Principal components of gene-level methylation. References & Support DOI: 10.5281/zenodo.18344496 Software: knowYourCG (Bioconductor) Background Paper: Goldberg and Fu et al., Science Advances (2025)
Dataset for "KnowYourCG: Facilitating base-level sparse methylome interpretation" --- hg38
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-22
datasetOpen access1st authorCorrespondingKYCG Knowledgebase Sets (hg38) Overview This repository contains comprehensive knowledgebase sets for the KnowYourCG (KYCG) framework, designed for functional DNA methylation analysis at base-level resolution. These databases enable rapid enrichment testing and interpretation of diverse methylation datasets, including sparse sequencing data (low-pass, single-cell), 5-hydroxymethylation (5hmC) profiles, spatial methylomes, and array-based EWAS datasets. Citation: Goldberg DC, Fu H, Atkins D, Moyer E, Lee CN, Deng Y, Zhou W. (2025). KnowYourCG: Facilitating base-level sparse methylome interpretation. Science Advances 11(43). DOI: 10.1126/sciadv.adw3027 Reference Coordinates cpg_nocontig.cr Complete reference coordinates for all CpG sites in hg38 (excluding contigs) Essential baseline for enrichment testing and coordinate mapping I. Sequence Features nFlankCG.20220321.cm - CpG count in flanking regions (standard window) nFlankCG50.20231025.cm - CpG count within 50bp flanking regions nFlankCG100.20231025.cm - CpG count within 100bp flanking regions Tetranuc2.20220321.cm - Four-base sequence context surrounding CpG sites CGI.20220904.cm - CpG island annotations rmsk1.20220307.cm + .idx - RepeatMasker annotations (class 1) rmsk2.20220321.cm + .idx - RepeatMasker annotations (class 2) II. Genomic Features Chromosome.20221129.cm - Basic chromosome annotations ChromosomeXY.20230901.cm - Sex chromosome-specific features Centromere.20221129.cm - Centromeric regions Win100k.20220228.cm - 100kb genomic window annotations ABCompartment.20220911.cm - A/B compartment annotations (open/closed chromatin) PMD.20220911.cm - Partially Methylated Domains CTCFbind.20220911.cm - CTCF binding sites (chromatin loop anchors) ChromHMM.20220303.cm - Standard ChromHMM state annotations ChromHMMfullStack.20230515.cm - Comprehensive ChromHMM states across multiple cell types REMCChromHMM.20220911.cm - Roadmap Epigenomics ChromHMM states HM.20221013.cm + .idx - Comprehensive histone modification marks (H3K4me3, H3K27ac, H3K9me3, H3K27me3, etc.) MetagenePC.20220911.cm + .idx - Positional information relative to gene features (promoters, gene bodies, 3'UTRs) TFBS.20220921.cm + .idx - TFBS collection TFBSrm.20221005.cm + .idx - Roadmap Epigenomics TFBS (~1,188 transcription factors) RoadMapPosGeneExpCpG.20220814.cm - CpGs positively correlated with gene expression RoadMapNegGeneExpCpG.20220814.cm - CpGs negatively correlated with gene expression III. Trait Associates TiSigBLUEPRINT.20221209.cm + .idx - Hematopoietic cell type signatures (blood lineages) TiSigBrain.20221209.cm + .idx - Brain cell type signatures (neurons, glia) TiSigLoyfer.20221209.cm + .idx - Broad tissue and cell type atlas ImprintingDMR.20220818.cm - Genomically imprinted differentially methylated regions IntermediateMeth.20221121.cm - CpGs with intermediate methylation levels (25-75%) IntermediateMethS.20221121.cm - Stable intermediate methylation sites XCILinkedWGBS.20221121.cm - X-chromosome inactivation-associated CpGs XCILinkedWGBSSorted.20221121.cm - Sorted XCI-linked sites IV. Technical Associates Blacklist.20220304.cm - Problematic genomic regions for filtering (high coverage artifacts, repeats) Resources YAME KYCG Bioconductor hg38 (this page) mm10 Funding: NIH/NIGMS 5R35GM146978
Dataset for "KnowYourCG: Facilitating base-level sparse methylome interpretation" --- hg38
Zenodo (CERN European Organization for Nuclear Research) · 2025-10-24
datasetOpen access1st authorCorrespondingKYCG Knowledgebase Sets (hg38) Overview This repository contains comprehensive knowledgebase sets for the KnowYourCG (KYCG) framework, designed for functional DNA methylation analysis at base-level resolution. These databases enable rapid enrichment testing and interpretation of diverse methylation datasets, including sparse sequencing data (low-pass, single-cell), 5-hydroxymethylation (5hmC) profiles, spatial methylomes, and array-based EWAS datasets. Citation: Goldberg DC, Fu H, Atkins D, Moyer E, Lee CN, Deng Y, Zhou W. (2025). KnowYourCG: Facilitating base-level sparse methylome interpretation. Science Advances 11(43). DOI: 10.1126/sciadv.adw3027 Reference Coordinates cpg_nocontig.cr Complete reference coordinates for all CpG sites in hg38 (excluding contigs) Essential baseline for enrichment testing and coordinate mapping I. Sequence Features nFlankCG.20220321.cm - CpG count in flanking regions (standard window) nFlankCG50.20231025.cm - CpG count within 50bp flanking regions nFlankCG100.20231025.cm - CpG count within 100bp flanking regions Tetranuc2.20220321.cm - Four-base sequence context surrounding CpG sites CGI.20220904.cm - CpG island annotations rmsk1.20220307.cm + .idx - RepeatMasker annotations (class 1) rmsk2.20220321.cm + .idx - RepeatMasker annotations (class 2) II. Genomic Features Chromosome.20221129.cm - Basic chromosome annotations ChromosomeXY.20230901.cm - Sex chromosome-specific features Centromere.20221129.cm - Centromeric regions Win100k.20220228.cm - 100kb genomic window annotations ABCompartment.20220911.cm - A/B compartment annotations (open/closed chromatin) PMD.20220911.cm - Partially Methylated Domains CTCFbind.20220911.cm - CTCF binding sites (chromatin loop anchors) ChromHMM.20220303.cm - Standard ChromHMM state annotations ChromHMMfullStack.20230515.cm - Comprehensive ChromHMM states across multiple cell types REMCChromHMM.20220911.cm - Roadmap Epigenomics ChromHMM states HM.20221013.cm + .idx - Comprehensive histone modification marks (H3K4me3, H3K27ac, H3K9me3, H3K27me3, etc.) MetagenePC.20220911.cm + .idx - Positional information relative to gene features (promoters, gene bodies, 3'UTRs) TFBS.20220921.Part1.cm + .idx - TFBS collection Part 1 TFBS.20220921.Part2.cm + .idx - TFBS collection Part 2 TFBSrm.20221005.cm + .idx - Roadmap Epigenomics TFBS (~1,188 transcription factors) RoadMapPosGeneExpCpG.20220814.cm - CpGs positively correlated with gene expression RoadMapNegGeneExpCpG.20220814.cm - CpGs negatively correlated with gene expression III. Trait Associates TiSigBLUEPRINT.20221209.cm + .idx - Hematopoietic cell type signatures (blood lineages) TiSigBrain.20221209.cm + .idx - Brain cell type signatures (neurons, glia) TiSigLoyfer.20221209.cm + .idx - Broad tissue and cell type atlas ImprintingDMR.20220818.cm - Genomically imprinted differentially methylated regions IntermediateMeth.20221121.cm - CpGs with intermediate methylation levels (25-75%) IntermediateMethS.20221121.cm - Stable intermediate methylation sites XCILinkedWGBS.20221121.cm - X-chromosome inactivation-associated CpGs XCILinkedWGBSSorted.20221121.cm - Sorted XCI-linked sites IV. Technical Associates Blacklist.20220304.cm - Problematic genomic regions for filtering (high coverage artifacts, repeats) Resources Documentation: YAME KYCG Bioconductor Downloads: hg38 mm10 Funding: NIH/NIGMS 5R35GM146978
Classifying AD and PART: An Epigenetic Signature of Cognitive Resilience
Alzheimer s & Dementia · 2025-12-01
articleOpen accessBACKGROUND: Whether primary age-related tauopathy (PART) is a distinct age-associated disorder or merely a prodrome of Alzheimer's disease (AD) remains controversial. Both share similar limbic tau but differ in amyloid burden and tau spread. Differential DNA methylation (DNAm) can offers insights into disease biology, biomarkers, and potential therapeutic targets. We therefore used a machine learning classifier trained on DNAm to distinguish PART from AD and then applied the classifier to stratify pathologically-indeterminate cases. METHOD: We evaluated DNAm frontal cortex from ROSMAP (N = 707), and trained a support vector machine classifer on 176 PART (A0-3, B0-3, C0) and 118 AD (A0-3, B3, C3) cases. We validated on 142 external cases from the Mount Sinai Brain Bank. We then applied the classifier to stratify neuropathologically-indeterminate cases (A0-3, B0-3, C1-2 and A0-3, B0-2, C3) as Predicted-PART or Predicted-AD. We compared the neuropathological, cognitive, DNAm, and transcriptomic profiles of the prediction groups. RESULTS: When trained on a random sample of 80% of PART and AD cases, the classifier achieved 65% positive and 81% negative predictive value on the remaining 20% of cases. A final model trained on all ROSMAP PART and AD cases accurately classified a majority of external Braak NFT Stage 0-II cases as Predicted-PART (63%) and a majority of Braak NFT Stave V-VI cases as Predicted-AD (85%). The classifier stratified neuropathologically-indeterminate cases into prediction groups that had similar tau and amyloid burden, but differed in methylation at 570 CpGs, associated expression of 2,179 genes, and gene ontology terms related to vesicle transport, oxidative phosphorylation, and synaptic transmission. Despite similarities in neuropathological burden, Predicted-PART individuals scored higher on the MMSE than Predicted-AD individuals (p <1E-5). CONCLUSIONS: DNAm distinguishes PART from AD. The DNAm-informed machine learning tool predicts PART vs. AD with high accuracy in multiple cohorts. Moreover, it stratifies indeterminate cases with similar pathology into biologically distinct groups. Together, the data suggest that in individuals with PART, a specific brain epigenetic and biological program contributes to resistance to AD pathology and associated cognitive resilience.
Zenodo (CERN European Organization for Nuclear Research) · 2025-10-24
datasetOpen access1st authorCorrespondingA fast and lightweight toolkit for storing, manipulating, and analyzing large-scale DNA methylation data at the sequence level. For detailed documentation, tutorials, and usage examples, visit the YAME User Guide. YAME is designed for efficient sequence-level DNA methylation data management, capable of handling both bulk and single-cell DNA methylome workflows. It introduces a family of compact binary formats (CX formats) that represent methylation values, MU counts, categorical states, fraction data, masks, and genomic coordinates in a uniform compressed structure.
Zenodo (CERN European Organization for Nuclear Research) · 2025-10-24
datasetOpen access1st authorCorrespondingA fast and lightweight toolkit for storing, manipulating, and analyzing large-scale DNA methylation data at the sequence level. For detailed documentation, tutorials, and usage examples, visit the YAME User Guide. YAME is designed for efficient sequence-level DNA methylation data management, capable of handling both bulk and single-cell DNA methylome workflows. It introduces a family of compact binary formats (CX formats) that represent methylation values, MU counts, categorical states, fraction data, masks, and genomic coordinates in a uniform compressed structure.
Brain telomere length associates with hippocampal ptau and is mediated by DNA methylation
Alzheimer s & Dementia · 2025-12-01
articleOpen accessBACKGROUND: Telomeres are repetitive DNA sequences at the ends of chromosomes which contribute to maintaining chromosomal stability. Telomere shortening is a hallmark of aging and shorter blood leukocyte telomere length (LTL) has been associated with increased risk for age-related diseases, however, little is understood about the biology of brain telomeres and how they may be involved in disease. Considering the increased neuropathologic burden of phosphorylated tau (ptau) with age, we investigated how shorter brain telomere length (brain-TL) may relate to increased ptau burden. METHODS: We studied a cohort of 112 individuals with primary age-related tauopathy (PART), a neuropathological diagnosis characterized by mild-to-moderate tau burden (Braak=I-IV) primarily in the medial temporal lobe, with the relative absence of amyloid-beta plaques (CERAD=0). These individuals had both brain-TL (mean length by telomere qPCR, blinded) and DNA methylation measures from the frontal cortex, along with semi-quantitative Aperio ptau measures from the hippocampus. A subset (n = 81) had SNP genotyping data available. In an independent cohort (n = 10, Braak=0-VI, CERAD=0-3), we performed quantitative fluorescence in-situ hybridization (FISH) microscopy to measure the average ratio of telomere to centromere DNA content in nuclei from the frontal and visual cortices. RESULTS: In linear regression models, frontal cortex brain-TL did not relate to age. When age-adjusted, shorter brain-TL related to higher hippocampal ptau (β=-1.06, CI=-1.92--0.195, p = 0.017). A previously established DNA methylation model predictive of hippocampal ptau partially mediated the relationship between brain-TL and hippocampal ptau (proportion mediated=0.664, CI=0.246-1.33, p = 0.012, Figure 1). A polygenic score for LTL did not relate to either age, brain-TL or hippocampal ptau. With FISH, we observed that individuals with CERAD=0 had shorter telomeres in the frontal cortex compared to individuals with CERAD=3. Within the CERAD=0 group, an individual with Braak=II had shorter telomeres than an individual with Braak=I. These patterns were not observed in the visual cortex. CONCLUSIONS: In a PART cohort, shorter frontal cortex brain-TL was related to higher hippocampal ptau, and this relationship was partially mediated by a DNA methylation model predictive of hippocampal ptau. A polygenic score for LTL was not predictive of brain-TL or hippocampal ptau. Together, this further emphasizes the importance of tissue-specific epigenetic modifiers of age-related ptau neuropathology.
Ecological Realism Accelerates Epigenetic Aging in Mice
Aging Cell · 2025-05-21 · 3 citations
articleOpen accessSenior authorCorrespondingThe aging of mammalian epigenomes fundamentally alters cellular functions, and such changes are the focus of many healthspan and lifespan studies. However, studies of this process typically use mouse models living under standardized laboratory conditions and neglect the impact of variation in social, physical, microbial, and other aspects of the living environment on age-related changes. We examined differences in age-associated methylation changes between traditionally laboratory-reared mice from Jackson Laboratory and "rewilded" C57BL/6J mice, which lived in an outdoor field environment at Cornell University with enhanced ecological realism. Systematic analysis of age-associated methylation dynamics in the liver indicates a genomic region-conditioned, faster epigenetic aging rate in mice living in the field than those living in the lab, implicating perturbed 3D genome conformation and liver function. Altered epigenetic aging rates were more pronounced in sites that gain methylation with age, including sites enriched for transcription factor binding related to DNA repair. These observations underscore the overlooked role of the social and physical environment in epigenetic aging with implications for both basic and applied aging research.
KnowYourCG: Facilitating base-level sparse methylome interpretation
Science Advances · 2025-10-24
articleOpen accessSenior authorCorrespondingDecoding DNA methylomes for biological insights is critical in epigenetics research. We present KnowYourCG (KYCG), a data interpretation framework designed for functional DNA methylation analysis. Unlike existing tools that target genes or genomic intervals, KYCG features direct base-level screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait associations. Through implementing efficient infrastructure that rapidly screens and investigates thousands of knowledgebases, KYCG addresses the challenges of data sparsity in various methylation datasets, including low-pass or single-cell DNA methylomes, 5-hydroxymethylation (5hmC) profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies. Applying KYCG to these datasets provides valuable insights into cell differentiation, cancer origins, epigenome-trait associations, and technical issues such as array artifacts, single-cell batch effects, and Nanopore 5hmC detection accuracy. Our tool simplifies large-scale methylation analysis and integrates seamlessly with standard assay technologies.
Recent grants
Decoding Single-cell DNA Methylomes for Epigenetic Cell Identity
NIH · $1.8M · 2022–2027
Frequent coauthors
- 165 shared
Andrew D. Cherniack
- 151 shared
L. Sylvia
Mirai Hospital
- 144 shared
Joshua M. Stuart
University of California, Santa Cruz
- 142 shared
Hui Shen
Van Andel Institute
- 132 shared
Rory Johnson
University Hospital of Bern
- 115 shared
Linghua Wang
- 112 shared
Gad Getz
- 109 shared
Galen F. Gao
Education
- 2013
PhD, Bioengineering
Rice University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Wanding Zhou
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup