Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Pei Fen Kuan

Pei Fen Kuan

Verified

Stony Brook University · Psychology

Active 1992–2025

h-index44
Citations7.3k
Papers19771 last 5y
Funding$1.5M
See your match with Pei Fen Kuan — sign in to PhdFit.Sign in

About

Pei Fen Kuan is a Professor in the Department of Applied Mathematics and Statistics at Stony Brook University. She received her B.Sc. with First Class Honours in Statistics and Applied Mathematics from the National University of Singapore, and her M.S. and Ph.D. in Statistics from the University of Wisconsin–Madison. Her research centers on statistical, computational, and AI-driven approaches to challenges in genome biology. Her lab utilizes tiling arrays and next-generation sequencing platforms to study biological regulation in cancer, aging, and psychiatric disorders. A key focus of her work is developing and applying statistical and machine learning methods to enhance the analysis, integration, and interpretation of high-throughput omics data.

Research topics

  • Internal medicine
  • Medicine
  • Biology
  • Genetics
  • Pathology
  • Computational biology
  • Cancer research

Selected publications

  • Increased Aβ40 in plasma is associated with severity of exposure to airborne pollutants at the World Trade Center: a cross-sectional study of neurological biomarkers

    The Journals of Gerontology Series A · 2025-06-24 · 2 citations

    articleOpen access

    World Trade Center (WTC) responders who were more severely exposed to the airborne pollution while working in rescue and recovery work would have heightened circulating levels of β-Amyloid (Aβ) levels in plasma. Plasma for 905 WTC responders was retrieved in 2019 and flash frozen and assayed using single molecule analysis to measure circulating levels of two subtypes of Aβ (Aβ40, Aβ42), alongside phosphorylated tau-181, glial fibrillary acidic protein (GFAP), and neurofilament-light. Plasma data were linked to demographics, blood volume, apolipoprotein-ε4 status, and medical outcomes as well as, in a subsample, with neuroimaging-based measures of cortical thickness. Amyloidogenesis was measured using the ratio of observed/expected levels of Aβ40 and labeled Normalized Aβ40. Spearman's rho was used to examine correlations; generalized linear modeling was used to examine multivariable-adjusted associations. The average age of WTC responders was 55.98 years, and 73.9% had completed at least some college. Observed Aβ40 levels were 24.61% higher than expected values, and lower in minimally exposed WTC responders as compared to severely exposed WTC responders (17.26 vs 44.48%, P = .005). Results remained statistically significant upon adjusting for covariates (adjusted blood volume ratio = 1.11 [1.02-1.22] P = .019). Normalized Aβ40 levels were associated with higher measures of phosphorylated tau-181, Aβ42, GFAP, and neurofilament-light in serology as well as, in a subsample (n = 70), with reduced cortical thickness (rho = -0.29, P = .020). Increased amyloidogenesis may be a neuropathological response in people who are severely or chronically exposed to airborne neurotoxic pollutants.

  • Supplementary Data1 from Tumor miRNA Signatures Associate with Outcomes of Patients with Stage II/III Melanoma

    2025-12-15

    articleOpen access

    <p>Supplementary Tables</p>

  • Tumor miRNA Signatures Associate with Outcomes of Patients with Stage II/III Melanoma

    Clinical Cancer Research · 2025-10-20

    article

    PURPOSE: Patients with stage II and resected stage III melanomas have variable clinical outcomes, providing evidence of underlying biological differences in tumors and/or the patients themselves, beyond stage. The approval of adjuvant immunotherapy for stage IIB/C and resected stage III/IV disease (and adjuvant targeted therapy for resected stage III disease) has created a pressing need to develop biomarkers to accurately distinguish patients at low risk versus high risk for recurrence and death from melanoma. miRNAs are promising biomarkers because of their stability in tissues and fluids and their demonstrated functional and prognostic roles in melanoma. We hypothesized that miRNA expression could be integrated into prognostic models that would classify 5-year survival outcomes more accurately than clinical factors alone. EXPERIMENTAL DESIGN: Using a NanoString miRNA Expression Assay, we analyzed 715 primary melanomas from patients with stage II or stage III disease within the InterMEL consortium and examined associations between miRNA expression and melanoma-specific death. RESULTS: When integrated into clinical prognostic models for 5-year melanoma-specific survival, miRNA signatures improved the area under the receiver operating characteristic curve for patients in stage II from 0.71 for a "clinical factors-only" model to 0.81 for a "clinical plus miRNA" model in an independent test set, an improvement of 0.10 with a 95% confidence interval (0.03-0.19). The improvement was more modest for patients in stage III who were included in the analysis. CONCLUSIONS: Incorporating miRNA expression in primary melanomas may enhance the accuracy of clinical prognostic models and potentially aid in the selection of patients with melanoma for adjuvant treatment and clinical trials.

  • Supplementary Figures1 from Tumor miRNA Signatures Associate with Outcomes of Patients with Stage II/III Melanoma

    2025-12-15

    articleOpen access

    <p>Supplementary Figures S1-S9</p>

  • A systematic evaluation of cell-type-specific differential methylation analysis in bulk tissue

    Briefings in Bioinformatics · 2025-03-01

    articleOpen accessSenior author

    We conducted a systematic assessment of computational models-CellDMC, TCA, HIRE, TOAST, and CeDAR-for detecting cell-type-specific differential methylation CpGs in bulk methylation data profiled using the Illumina DNA Methylation BeadArrays. This assessment was performed through simulations and case studies involving two epigenome-wide association studies (EWAS) on rheumatoid arthritis and major depressive disorder. Our evaluation provided insights into the strengths and limitations of each model. The results revealed that the models varied in performance across different metrics, sample sizes, and computational efficiency. Additionally, we proposed integrating the results from these models using the minimum p-value ($minpv$) and average p-value ($avepv$) approaches. Our findings demonstrated that these aggregation methods significantly improved performance in identifying cell-type-specific differential methylation CpGs.

  • Reduced Gray-White Matter Contrast in Chronic Posttraumatic Stress Disorder in World Trade Center Responders

    Biological Psychiatry Cognitive Neuroscience and Neuroimaging · 2025-11-01

    articleOpen access
  • Polygenic Risk and Exposure Severity Predict Trajectories of PTSD: A Prospective Cohort Study

    Molecular Psychiatry · 2025-09-19

    articleOpen access
  • Distinct Characteristics of Lymphoid and Myeloid Clonal Hematopoiesis in World Trade Center First Responders

    American Journal of Hematology · 2025-07-29 · 1 citations

    letterOpen access

    CHIP in WTC first responders: Study design. More than two decades after the 9/11 attacks, long-term health consequences for World Trade Center (WTC) rescue and recovery workers continue to emerge. Over 91 000 individuals participated in rescue, recovery, debris cleanup, and restoration of essential services. These first responders, including firefighters, police officers, paramedics, engineers, steel workers, railway tunnel workers, telecommunications specialists, sanitation workers, medical examiners, and volunteers, had no prior training in civil disaster response. These responders faced unprecedented exposure to a complex mix of known or suspected airborne carcinogens, raising concerns regarding their long-term cancer risk. WTC responders inhaled a toxic blend including benzene, formaldehyde, asbestos, silica, cement dust, glass fibers, heavy metals, polycyclic aromatic hydrocarbons, polychlorinated biphenyls, polychlorinated dibenzofurans, and dioxins [1]. While the carcinogenicity of these individual compounds is well documented, their collective impact on hematologic malignancy risk remains poorly characterized, creating a critical knowledge gap. A key mechanism potentially linking environmental exposures to cancer development is clonal hematopoiesis [2], where a subpopulation of blood cells harbors clonal genomic mutations associated with blood cancers, without detectable hematologic disorders or unexplained persistent cytopenia. Clonal hematopoiesis of indeterminate potential (CHIP) involves clonal subpopulations carrying a point mutation or short insertion/deletion with a variant allele fraction (VAF) of at least 2% in genes recurrently mutated in hematologic malignancies. While most research has focused on myeloid lineage CHIP (M-CHIP), lymphoid lineage CHIP (L-CHIP) also occurs and may contribute to lymphoid malignancy risk, though it remains understudied. Beyond cancer risk, CHIP is associated with cytopenias, cardiovascular disease, infection susceptibility, and all-cause mortality, making it a potentially important biomarker for multiple health conditions relevant to WTC responders. To characterize the full spectrum of CHIP mutations in WTC responders, we performed ultra-deep Whole Exome Sequencing (WES) at 250× on 350 blood samples from 345 participants from the World Trade Center Health Program (WTCHP) General Responders Cohort (GRC) cohort aged 48–90 years (median, 59 years), without a history of hematologic malignancy at enrollment (Figure 1A, Table 1). After rigorous quality control, we identified M-CHIP and L-CHIP mutations (Figure 1B, Table S1) and analyzed their associations with age, ancestry, WTC exposure, HLA zygosity, and various clinical, laboratory, mental, and cognitive metrics. CHIP prevalence was also compared to an unexposed New York City control cohort (n = 293; Table S2). Detailed methodology is provided in the Supporting Information. We found that 34.2% (118/345) of WTC participants harbored at least one CHIP mutation. 16.2% harbored M-CHIP mutations (56 participants, 71 mutations, 13 genes), and 21.4% carried L-CHIP mutations (74 participants, 85 mutations, 43 genes) (Table S3). Clonal complexity (≥ 2 mutations) was present in 7.0% (24/345), and 3.5% (12 participants) carried both M-CHIP and L-CHIP mutations. The majority (> 80%) of CHIP-positive participants harbored only a single mutation (Figure S1). Among M-CHIP mutations, 39% were non-synonymous, 30% were stop-gain, 23% were frameshift deletions, and the remainder were frameshift insertions and splicing variants. The majority of L-CHIP mutations (87%) were non-synonymous, 9% were stop-gain, and the rest were frameshift insertions/deletions (Figure S1). The most frequently mutated M-CHIP genes were DNMT3A, TET2, PPM1D, and ASXL1, and L-CHIP genes were EEF1A1, DDX11, KMT2D, ATM, and FAT2 (Figure 1C,D). Mutations in EEF1A1 (21%) and DDX11 (15%), particularly at (EEF1A1:E293K, EE1F1A:V315L and DDX11:P368S) drove the L-CHIP signal. Among all CHIP mutations, TET2 exhibited the highest VAF (Figure S2). Consistent with prior studies [3], M-CHIP prevalence increased with age, especially for DNMT3A and TET2, which remained significant in multivariate logistic regression (Figure 1E,F, Table S4). Median ages for CHIP-positive, M-CHIP, DNMT3A-mutant, and TET2-mutant participants were 61, 62, 66.5, and 64 years respectively, compared to 59 years for CHIP-negative participants. Smoking history positively associated with M-CHIP in both univariate and multivariate regression analyses. Associations between BMI and M-CHIP, particularly BMI and DNMT3A, may require larger sample sizes to clarify. As WTC responders age, they face increased risks for neurocognitive and motor dysfunction that resembles neurodegenerative diseases, along with cortical atrophy and cognitive impairment at midlife, associated with both physical exposures at the WTC site and chronic PTSD. Notably, participants with DDX11 mutations showed significantly higher PTSD Checklist (PCL) scores (indicating greater PTSD severity) and lower Montreal Cognitive Assessment (MoCA) scores (indicating mild cognitive impairment) (Figure 1F). MoCA score association remained significant in multivariate logistic regression. The median MoCA score for participants with DDX11 mutations was 22, falling within the range indicative of mild cognitive impairment. Given that DDX11 is a helicase involved in altering RNA secondary structure, and helicase dysfunction has been implicated in neurodegeneration, these findings suggest a link between CHIP and neurocognitive outcomes worthy of further investigation. In laboratory parameters (Figure 1F, Figure S3), M-CHIP positive cases showed lower platelet counts in both univariate and multivariate regression models compared to CHIP-negative cases. DNMT3A and TET2 mutation carriers had lower absolute lymphocyte counts, and TET2 mutation was also associated with lower red blood cell (RBC) counts and higher segmented neutrophils. PPM1D mutations correlated with lower platelet counts and higher mean corpuscular hemoglobin (MCH). DDX11 mutation carriers demonstrated elevated absolute lymphocyte counts, mean corpuscular volume (MCV), lymphocyte-to-monocyte ratio (LMR), and lower segmented neutrophil counts. Comparative analyses against UK Biobank data [4] highlighted both convergent (e.g., TET2 with lower lymphocyte counts and PPM1D with lower platelet counts) and divergent (e.g., M-CHIP associations with lower platelet counts and TET2 associations with elevated segmented neutrophil counts) patterns. However, multivariate analysis adjusting for age and other factors attenuated most associations. Differences may reflect WTC-debris exposure or methodological factors. Recent population-level studies suggest that abnormal complete blood counts (CBC) may predict future CHIP development, and CHIP-positive individuals with abnormal myeloid or lymphoid counts face the highest risks for corresponding malignancies. Thus, CHIP-positive WTC responders with elevated myeloid and/or lymphoid blood counts may represent a high-priority subgroup for enhanced surveillance. Given the emerging interest in targeting inflammatory pathways using inhibitors of IL-1β, NLRP3, and IRAK1 in CHIP, such interventions in high-risk CHIP-positive WTC participants may help prevent or delay overt myeloid neoplasms, CVD, major bleeding, and infection. Next, cognizant of CHIP association with immune dysfunction, we examined Human Leukocyte Antigen (HLA) type (both classes I and II) zygosity in relation to CHIP. The heterozygote advantage hypothesis suggests that heterozygous HLA genotypes may confer broader immune response capabilities than homozygous HLA genotypes. CHIP prevalence was statistically significantly associated with HLA-DMB homozygosity, which remained significant in multivariate logistic regression, with weaker associations between PPM1D and HLA-DQA1 homozygosity, and inverse associations between EEF1A1 and HLA-DPA1 homozygosity. These findings implicate HLA Class II antigen presentation in CHIP risk, supporting a role for immune-genetic interactions. Note that cross-study comparisons of CHIP prevalence remain challenging due to evolving definitions and lack of standardized analytical pipelines. The 2022 World Health Organization CHIP definition includes somatic mutations in myeloid malignancy-associated genes at VAF ≥ 2% (≥ 4% for X-linked gene mutations in males) in individuals without hematologic disorders or unexplained cytopenia. However, no guidelines exist on corresponding read depth requirements, as mutations with a true VAF of 2% are more likely missed at 30× than 250× depth. Studies also vary on CHIP variants/genes, without international consensus, and no established “gold standard” analysis pipelines exist for CHIP detection. Technical artifact filtering also varies between studies, further complicating cross-study comparisons. Therefore, we compared CHIP prevalence to healthy, unexposed controls from the New York area using the same CHIP calling pipeline. We observed a generally higher CHIP prevalence in the WTC cohort (Figure 1G,H, Table S5). After down-sampling for comparable sequencing coverage, the prevalence of detectable M-CHIP mutations in our WTC cohort was 7.5% (26/345) and L-CHIP was 9.9% (34/345), while in unexposed controls M-CHIP was 3.1% (9/293) and L-CHIP was 2.0% (6/293). Notably, L-CHIP mutations were statistically significantly more prevalent in WTC participants versus controls (p = 0.031). However, M-CHIP prevalence did not reach statistical significance (p = 0.693). At gene level, EEF1A1 (p = 0.024) and DDX11 (p = 0.071) were elevated in WTC responders (Figure S4). This is the first comprehensive characterization of CHIP mutation patterns across occupational groups in WTC responders, leveraging extensive clinical, laboratory, cognitive, and HLA zygosity data, and the first characterization of L-CHIP in a WTC-associated cohort. Our findings expand on previous research focused primarily on firefighters [5], given the documented elevated cancer risks for those who were not firefighters [6]. Furthermore, using deep WES allowed an unbiased survey across myeloid and lymphoid genes, creating a valuable resource for future studies as CHIP definitions evolve. This study has broader implications. Other populations exposed to large-scale fires and collapses, including civilians in modern warfare in populated cities, may also face elevated CHIP risks. Future research should aim to refine cancer risk prediction models that include CHIP status, blood counts, and demographic factors to guide early detection strategies. Although relative risks are increased, the absolute risk of malignancy in CHIP-positive individuals remains low, underscoring the need for careful risk stratification rather than blanket interventions. As CHIP clinics emerge at academic centers, longitudinal data will inform improved risk models, allowing high-risk individuals to be monitored appropriately and potentially enrolled in clinical trials. Study limitations include occupational confounders in our unexposed controls, inability to detect mosaic chromosomal alterations via WES, modest sample size limiting detection of subtle associations (particularly for L-CHIP), and the risk of false positives from multiple comparisons. Independent validation cohorts will be critical to confirm these findings. P.B. and Z.H.G. conceived and designed the study. M.E.S., P.F.K., R.J.K., and Z.H.G. wrote the manuscript. B.J.L. and X.Y. recruited participants and handled sample and data collection. Z.H.G. led sample sequencing at Azenta Inc. M.E.S. performed sequence analyses, and P.F.K. performed statistical analyses. All authors were involved in the interpretation of the results. B.J.L., J.M., and P.B. edited the manuscript. Z.H.G. supervised the study. All authors approved the final manuscript. The study received annual approval under IRB #604113 from the Committees on Research Involving Human Subjects at SBU. The authors declare no conflicts of interest. We will deposit the whole exome sequencing data of the WTC responders in the database of Genotypes and Phenotypes (dbGaP). Supporting Information Table S1. List of (A) myeloid associated (M-CHIP) mutations and (B) lymphoid associated (L-CHIP) mutations considered in this study. Supporting Information Table S2. Characteristics of the 293 unexposed controls from the MSCCR cohort. Supporting Information Table S3. All CHIP mutations identified among the 345 WTC samples. (A) M-CHIP mutations identified in WTC samples; (B) L-CHIP mutations identified in WTC samples; (C) Distribution of M-CHIP mutations by genes; (D) Distribution of L-CHIP mutations by genes; (E) M-CHIP mutations identified in downsampled WTC samples; (F) L-CHIP mutations identified in downsampled WTC samples; (G) M-CHIP mutations identified in controls; (H) L-CHIP mutations identified in controls. Supporting Information Table S4. CHIP-phenotype associations. (A) CHIP; (B) M-CHIP; (C) L-CHIP; (D) DNMT3A; (E) TET2; (F) PPM1D; (G) EEF1A1; (H) DDX11. Supporting Information Table S5. Prevalence of M-CHIP and L-CHIP mutations in down-sampled 345 WTC debris-exposed first responders and 293 unexposed controls. Supporting Information Figure S1. Characteristics of M-CHIP and L-CHIP mutations in 345 WTC debris-exposed first responders. (A) Number of participants with 1, 2, 3, and 4 M-CHIP mutations; (B) Number of different types of M-CHIP mutations; (C) Number of participants with 1,2, 3, and 4 L-CHIP mutations; (D) Number of different types of L-CHIP mutations. Supporting Information Figure S2. Variant allele fractions (VAFs) of M-CHIP and L-CHIP mutations. (A) VAF histogram of M-CHIP mutations; (B) VAF histogram of L-CHIP mutations; (C) VAF distribution of top M-CHIP gene mutations; (D) VAF distribution of top L-CHIP gene mutations. Based on VAF, TET2 exhibited the largest M-CHIP clones, while EEF1A1 and DDX11 exhibited the largest L-CHIP clones. Supporting Information Figure S3. Comparison of key laboratory parameters between different groups. (A) Absolute lymphocyte counts; (B) Segmented neutrophil counts; (C) Platelet counts; (D) Red blood cell (RBC) counts; (E) Mean corpuscular hemoglobin (MCH); (F) Mean corpuscular volume (MCV) (G) Lymphocyte-to-monocyte ratio (LMR). Supporting Information Figure S4. Characteristics of M-CHIP and L-CHIP mutations in 345 WTC debris-exposed responders (blue) and 293 unexposed controls (red). (A) Genes with M-CHIP mutations; (B) Genes with L-CHIP mutations. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

  • Lung Cancer Incidence After September 11, 2001, Among World Trade Center Responders

    JAMA Network Open · 2025-10-09 · 2 citations

    articleOpen access

    Importance: Responders involved in rescue and recovery operations after the collapse of the World Trade Center (WTC) on September 11, 2001, were exposed to airborne carcinogens. Objectives: To examine the incidence of lung cancer after the WTC attacks and to compare the incidence of lung cancer among responders with varying degrees of exposure severity. Design, Setting, and Participants: In this prospective cohort study, data were collected between July 1, 2012, and December 31, 2023, from individuals who were enrolled in a medical monitoring program available to WTC responders residing on Long Island, New York. This study was restricted to people who survived and were followed up for incident lung cancer after a 10-year latency period. Exposures: Types and durations of exposures were based on responses to a detailed questionnaire about on-site work conditions, which included information about the type and duration of work, smells, and sights while working; exposure to dust; and the use of protective equipment. World Trade Center exposure characteristics and overall severity were measured as mild, moderate, and severe exposure using a validated approach. Main Outcomes and Measures: The incidence of lung cancer was the primary outcome. Diagnosis of lung cancer was ascertained following a standardized approach by trained clinicians, and diagnoses were verified by clinicians at the Centers for Disease Control and Prevention. Cox proportional hazards regression was used to estimate multivariable-adjusted hazard ratios. Result: Among 12 334 eligible responders (mean [SD] age at study inclusion, 49.3 [10.2] years; 11 213 men [90.9%]), 118 incident lung cancers were identified between July 1, 2012, and December 31, 2023 (incidence rate, 8.7/10 000 person-years [95% CI, 7.3-10.5 person-years]). When compared with mild exposures, the incidence of lung cancer was higher among moderately (adjusted hazard ratio [AHR], 1.86 [95% CI, 1.19-2.91]; P = .007) and severely (AHR, 2.90 [95% CI, 1.69-4.99]; P < .001) exposed groups. Specific WTC exposures, including smelling fumes (AHR, 1.05 [95% CI, 1.01-1.09]; P = .007) or sewage (AHR, 1.03 [95% CI, 1.01-1.05]; P = .004), were also associated with higher incidence of lung cancer after adjusting for demographics and measures of tobacco use. Conclusions and Relevance: In this cohort study of WTC responders, the incidence of lung cancer was higher among those with greater exposure severity. Future studies may investigate specific WTC exposures and histologic changes and clarify the role of WTC exposure for prognosis.

  • Transformed ROC Curve for Biomarker Evaluation

    Statistics in Medicine · 2024-11-12 · 4 citations

    article

    To complement the conventional area under the ROC curve (AUC) which cannot fully describe the diagnostic accuracy of some non-standard biomarkers, we introduce a transformed ROC curve and its associated transformed AUC (TAUC) in this article, and show that TAUC can relate the original improper biomarker to a proper biomarker after a non-monotone transformation. We then provide nonparametric estimation of the non-monotone transformation and TAUC, and establish their consistency and asymptotic normality. We conduct extensive simulation studies to assess the performance of the proposed TAUC method and compare with the traditional methods. Case studies on real biomedical data are provided to illustrate the proposed TAUC method. We are able to identify more important biomarkers that tend to escape the traditional screening method.

Recent grants

Frequent coauthors

Awards & honors

  • Three-time recipient of the CEAS Dean’s Millionaire Club awa…
  • Frey Family Foundation Professorship, 2021
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Pei Fen Kuan

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup