
PingHsun Hsieh
· Assistant ProfessorVerifiedUniversity of Minnesota · Cell Biology
Active 1964–2026
About
I am an Assistant Professor in the Department of Genetics, Cell Biology, and Development at the University of Minnesota, Twin Cities. My research interests are population genomics, human evolution, and evolutionary medicine. The research of our lab focuses on understanding key evolutionary processes, such as hybridization and selection, that lead to genetic novelties in populations and studying mutation effects in humans. I am particularly interested in the evolution and fitness consequences of structural variants and building evolutionary applications for biomedical research. My research involves using long-read sequencing, designing statistical methods, and analyzing large multi-omics and simulated datasets.
Research topics
- Biology
- Genetics
- Evolutionary biology
- Computational biology
Selected publications
HumanSeq_fromAlignment_T2TCHM13v2_clint
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-03
datasetOpen access1st authorCorrespondingThese files are human/chimp reference seqeucnes extracted from an alignment between T2T-CHM13v2.0 and panTro6. These are files only for the purpose of annotating ancestral states for sites listed in a VCF file.
HumanSeq_fromAlignment_T2TCHM13v2_clint
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-03
datasetOpen access1st authorCorrespondingThese files are human/chimp reference seqeucnes extracted from an alignment between T2T-CHM13v2.0 and panTro6. These are files only for the purpose of annotating ancestral states for sites listed in a VCF file.
Retroviral insertions contributed to the divergence of human and chimpanzee brains
bioRxiv (Cold Spring Harbor Laboratory) · 2025-12-12 · 1 citations
preprintOpen accessOver the past 5-7 million years, humans and chimpanzees have diverged in brain size, structural complexity, and cognitive abilities despite high conservation of protein-coding genes. Notably, the endogenization and proliferation of retroviral infections within host genomes has introduced numerous species-specific regulatory elements that have the potential to influence gene regulation. However, the role of these endogenous retroviruses in hominoid brain evolution remains unclear. A burst of lineage-specific PTERV1 retroviruses recently invaded the chimpanzee genome but are absent in humans. We conducted an epigenomic analysis of PTERV1 insertions in chimpanzee neural organoids and found that they are heavily covered by DNA methylation, representing more than 150 species-specific heterochromatin domains with the capacity to influence gene regulatory networks. We identified one such chimpanzee-specific PTERV1 insertion on chromosome 19 that blocks the expression of the long noncoding RNA LINC00662, via DNA methylation spread to the adjacent genomic region. The expression of LINC00662 was restored in chimpanzee induced pluripotent stem cells when we deleted the PTERV1 insertion using CRISPR editing. We found that LINC00662, a human-specific RNA, is highly expressed in the developing brain and plays an important role in the posttranscriptional control of neuronal maturation, axon outgrowth, and neural organoid development. In summary, our findings describe how endogenous retroviral insertions contributed to the functional divergence of the human and chimpanzee brains. This provides a new mechanism by which retroviral pandemics influenced primate brain speciation.
Structural and transduction patterns of human-specific polymorphic SVA insertions
Mobile DNA · 2025-11-06 · 1 citations
articleOpen accessBACKGROUND: SINE variable number tandem repeat Alu elements (SVAs) are a unique group of hominid-specific composite retrotransposons with highly variable internal structure. They represent the youngest TE family in humans and contribute to genetic diversity, evolution, and disease. Recent findings indicate that SVA mobilization rates may exceed previous estimates, and many SVAs exhibit insertion polymorphism. SVAs facilitate transduction (TD) events when transcription initiates upstream of a source element, or when their internal termination signal is bypassed, mobilizing adjacent 5' and/or 3' sequence. To investigate features of non-reference SVA elements currently polymorphic in the human genome, we analyzed a structural variant callset built upon 35 diverse human genomes generated by the Human Genome Structural Variation Consortium. RESULTS: is a major contributor to SVA expansion in the human population. We further uncover that 40% of non-reference SVAs carry a TD on their 5' and/or 3' ends. Of these, the majority (69%) harbor sequence originating in a gene, including 14 exonic events and the mobilization of a processed pseudogene, supporting the role of SVA in exon shuffling. In addition, we identified a so-called "orphan" TD, defined by the absence of SVA sequence at the insertion site. Leveraging TD origin coordinates, we identify 55 active source elements, including nine non-reference and 46 across GRCh38 and T2T-CHM13, giving rise to 84% of TD-carrying SVAs. CONCLUSIONS: is more active than previously described and is a main driver of SVA expansion. We find two-fold more TD events compared to previous estimates, with an unexpected bias toward 3' events. Finally, we postulate that the discrepant SVA mobilization rate may be attributed to inter-individual variation in the presence/absence of source elements, a recent uptick in mobilization supported by overall low allele frequencies, and/or negative selection against deleterious insertions.
bioRxiv (Cold Spring Harbor Laboratory) · 2025-02-05 · 3 citations
preprintOpen accessABSTRACT The NPIP (nuclear pore interacting protein) gene family has expanded to high copy number in humans and African apes where it has been subject to an excess of amino acid replacement consistent with positive selection (1). Due to the limitations of short-read sequencing, NPIP human genetic diversity has been poorly understood. Using highly accurate assemblies generated from long-read sequencing as part of the human pangenome, we completely characterize 169 human haplotypes (4,665 NPIP paralogs and alleles). Of the 28 NPIP paralogs, just three ( NPIPB2 , B11 , and B14 ) are fixed at a single copy, and only a single locus, B2 , shows no structural variation. Four NPIP paralogs map to large segmental duplication blocks that mediate polymorphic inversions (355 kbp–1.6 Mbp) corresponding to microdeletions associated with developmental delay and autism. Haplotype-based tests of positive selection and selective sweeps identify two paralogs, B9 and B15 , within the top percentile for both tests. Using full-length cDNA data from 101 tissue/cell types, we construct paralog-specific gene models and show that 56% (31/55 most abundant isoforms) have not been previously described in RefSeq. We define six distinct translation start sites and other protein structural features that distinguish paralogs, including a variable number tandem repeat that encodes a beta helix of variable size that emerged ∼3.1 million years ago in human evolution. Among the 28 NPIP paralogs, we identify distinct tissue and developmental patterns of expression with only a few maintaining the ancestral testis-enriched expression. A subset of paralogs ( NPIPA1 , A5 , A6-9 , B3-5 , and B12/B13 ) show increased brain expression. Our results suggest ongoing positive selection in the human population and rapid diversification of NPIP gene models.
A global view of human centromere variation and evolution
bioRxiv (Cold Spring Harbor Laboratory) · 2025-12-11 · 5 citations
preprintOpen accessABSTRACT Centromeres are essential for accurate chromosome segregation during cell division, yet their highly repetitive sequence has historically hindered their complete assembly and characterization. Consequently, the full spectrum of centromere diversity across individuals, populations, and evolutionary contexts remains largely unexplored. Here, we address this gap in knowledge by assembling and characterizing 2,110 complete human centromeres from a diverse cohort of individuals representing 5 continental and 28 population groups. By developing a novel suite of bioinformatic tools tailored for centromeric regions, we uncover previously unknown variation within centromeres, including 226 novel centromere haplotypes and 1,870 new α-satellite higher-order repeat (HOR) variants. We find that mobile element insertions are present in 30% of centromeres, with chromosome 16 harboring Alu elements within the kinetochore site at an 11-fold higher frequency than expected. While most centromeres have a single kinetochore site, 6% of them have di-kinetochores, and <<1% have tri-kinetochores, which we confirm with long-read CENP-A CUT&RUN, DiMeLo-seq, and multi-generational inheritance. We further show that the position of the kinetochore is not random and is, instead, closely associated with the underlying sequence and structure of the centromere. To understand the nature of evolutionary change, we compared 2,110 complete human centromeres to 5,747 complete centromeres recently assembled from the Human Pangenome Reference Consortium. We show that centromeres have a >50-fold variation in mutation rate, with the most rapidly mutating centromeres on chromosome 1 and the slowest mutating centromeres on chromosome Y. Additionally, a subset of centromeres show evidence of introgression from archaic hominins, shaping their sequence, structure, and evolutionary history. We validate these centromere mutation rates in a four-generation family, spanning 28 family members and 483 accurately assembled centromeres, and show that the kinetochore site is the most rapidly mutating region in the centromere, with twofold more single-nucleotide variants than the rest of the centromeric α-satellite HOR array on average. We propose a model that reveals an ‘arms race’ between centromeric sequence and proteins, with frequent mutations within the site of the kinetochore that lead to changes in genetic and epigenetic landscapes and, ultimately, rapid evolution of these critically important regions.
Cell Genomics · 2025-08-22 · 7 citations
articleOpen accessThe NPIP gene family is among the most positively selected gene families in humans/apes and drives independent duplication in primate lineages. These duplications promote genetic instability, leading to recurrent disease-associated microduplication and microdeletion syndromes. Despite its importance, little is known about its function or variation in humans, as short-read sequencing cannot distinguish high-identity duplications. Using long-read assemblies of 169 human haplotypes, we find extreme variation in the content and organization of NPIP loci. We identify fixed and polymorphic paralogs and observe ongoing positive selection. With long-read RNA sequencing (RNA-seq), we create paralog-specific gene models, the majority of which were not previously documented, and observe paralog-specific tissue specificity. This analysis of an exceptionally dynamic gene family provides candidates for future functional study.
Genome-wide diversity of chromosomal inversions and their disease relationships
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-26
preprintOpen accessSenior authorCorrespondingAbstract Chromosomal inversions shape evolution and are implicated in human disease, yet their effects on genomic variation and health outcomes remain poorly understood. We analyze genome-wide human inversion polymorphisms, contrasting single-event and recurrent loci. Inversion recurrence is validated using structured-coalescent simulations. We show that single-event inversions evolve in near-complete isolation: inverted haplotypes show ∼16-fold lower diversity and strong differentiation from direct haplotypes (median F ST = 0.33). By contrast, recurrent inversions maintain gene flow, resulting in similar diversity across orientations and ∼4-fold lower differentiation. We further find marked differences in coding sequence conservation between single-event and recurrent inversions. Using the NIH All of Us biobank, we impute inversions and identify four inversions with significant disease associations. Notably, the 17q21 inversion is associated with reduced risk of cognitive decline (OR=0.919) and breast cancer (OR=0.910) but with increased obesity risk (OR=1.097), consistent with pleiotropic selection. These findings establish inversions as major drivers of human genetic diversity and disease, with evolutionary outcomes critically dependent on recurrence.
A global map for introgressed structural variation and selection in humans
bioRxiv (Cold Spring Harbor Laboratory) · 2025-06-24 · 4 citations
preprintOpen access1st authorGenetic introgression from Neanderthals and Denisovan has shaped modern human genomes; however, introgressed structural variants (SVs ≥50 base pairs) remain challenging to discover. We integrated high-quality phased assemblies from four new Papua New Guinea (PNG) genomes with 94 published assemblies of diverse ancestry to infer an archaic introgressed SV map. Introgressed SVs are overall enriched in genes (44%, n=1,592), including critical genomic disorder regions, and most abundant in PNG. We identify 11 centromeres likely derived from archaic hominins, adding unexplored diversity to centromere genomics. Pangenome genotyping across 1,363 samples reveals 16 candidate adaptive SVs, many associated with immune-related genes and their expression, in the PNG. We hypothesize that archaic SV introgression contributed to reproductive success, underscoring introgression as a significant force in human adaptive evolution.
Structural and genetic diversity in the secreted mucins MUC5AC and MUC5B
The American Journal of Human Genetics · 2024-07-10 · 22 citations
articleOpen accessThe secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, ∼5,654 aa), H2 (33%, ∼5,742 aa), and H3 (7%, ∼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
Recent grants
The fitness effects of de novo structural variants
NIH · $264k · 2020–2022
Frequent coauthors
- 82 shared
Evan E. Eichler
University of Washington
- 33 shared
Katherine M. Munson
University of Washington
- 29 shared
Arvis Sulovari
- 28 shared
David Porubský
University of Washington
- 24 shared
Shwetha C. Murali
Vellore Institute of Technology University
- 23 shared
Stuart Cantsilieris
University of Washington
- 21 shared
Kai Ye
Rice University
- 20 shared
Marc Jan Bonder
University of Groningen
Education
B.S., Computer Engineering
National Central University, Taiwan
M.S., Electrical Engineering
National Taiwan University, Taiwan
M.S., Computational Biology
University of Southern California, CA
Ph.D., Ecology and Evolutionary Biology
University of Arizona
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with PingHsun Hsieh
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup