Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Yang Ni

Yang Ni

· Assistant ProfessorVerified

Texas A&M University · Statistics

Active 2008–2026

h-index17
Citations2.3k
Papers11861 last 5y
Funding$232k
See your match with Yang Ni — sign in to PhdFit.Sign in

Research topics

  • Artificial Intelligence
  • Computer Science
  • Data Mining
  • Machine Learning
  • Biology
  • Bioinformatics
  • Accounting
  • Finance
  • Computational biology
  • Economics
  • Monetary economics
  • Business
  • Financial system
  • Data science
  • Theoretical computer science
  • Algorithm
  • Genetics

Selected publications

  • Preprocessed single-cell RNA-seq and donor data for MR-CCC: Bayesian Mendelian Randomization for Causal Cell-Cell Communication

    Zenodo (CERN European Organization for Nuclear Research) · 2026-04-21

    datasetOpen accessSenior author

    Preprocessed data from the OneK1K cohort (Yazar et al., Science 2022) used in the MR-CCC analysis of causal cell-cell communication. B_T_NK_Monocytes.rda contains pseudo-bulk gene expression count matrices (genes × donors) for five cell types: B cells, CD4+ T cells, CD8+ T cells, NK cells, and monocytes. Each matrix was aggregated from single-cell counts per donor and library-size normalized. donor.rda contains donor-level metadata and genotype information, including: donor covariates (age, sex, ancestry principal components), a SNP genotype matrix (dosage-encoded), and GRanges objects for SNP and gene genomic coordinates used for cis-eQTL instrument construction. These files are intended for use with the MR-CCC analysis scripts available at: https://github.com/bitansa/MR-CCC Original OneK1K data: https://onek1k.org

  • Truncated Gaussian copula principal component analysis with application to pediatric acute lymphoblastic leukemia patients’ gut microbiome

    Statistical Methods in Medical Research · 2026-01-23

    article

    Increasing epidemiologic evidence suggests that the diversity and composition of the gut microbiome can predict infection risk in cancer patients. Infections remain a major cause of morbidity and mortality during chemotherapy. Analyzing microbiome data to identify associations with infection pathogenesis for proactive treatment has become a critical research focus. However, the high-dimensional nature of the data necessitates the use of dimension-reduction methods to facilitate inference and interpretation. Traditional dimension reduction methods, which assume Gaussianity, perform poorly with skewed and zero-inflated microbiome data. To address these challenges, we propose a semiparametric principal component analysis method based on a truncated latent Gaussian copula model that accommodates both skewness and zero inflation. Simulation studies demonstrate that the proposed method outperforms existing approaches by providing more accurate estimates of scores and loadings across various copula transformation settings. We apply our method, along with competing approaches, to gut microbiome data from pediatric patients with acute lymphoblastic leukemia. The principal scores derived from the proposed method reveal the strongest associations between pre-chemotherapy microbiome composition and adverse events during subsequent chemotherapy, offering valuable insights for improving patient outcomes.

  • Bayesian latent ising model for joint microbial and metabolomic network inference

    Journal of Applied Statistics · 2026-03-06

    article1st author
  • OneK1K B-cell dataset used for MR.RGM real-data analysis

    Open MIND · 2026-02-05

    dataset

    This record contains the real-data resources used in the OneK1K-based analyses of the MR.RGM and MR.RGM+ methods. The upload includes one R data archive: donor_b_cell.rda This file contains derived and curated data objects for B-cell samples from the OneK1K project, including:- donor-level covariates (age, sex, genotype PCs),- genotype dosage matrices aligned to donor IDs,- gene-level RNA count matrices aggregated at the donor level, and- genomic annotations for variants and genes. The data have been preprocessed to enable direct use in the real-data analysis scripts provided in the associated code repository. The files in this record are provided to ensure full reproducibility of the OneK1K real-data analyses reported in the associated manuscript. Original OneK1K data were generated by the OneK1K Consortium. This record redistributes derived and reorganized data products for methodological reproducibility only.

  • GTEx v7 muscle skeletal tissue data used for MR.RGM real-data analysis

    Open MIND · 2026-02-04

    datasetSenior author

    This record contains the real-data resources used in the GTEx-based analyses of the MR.RGM and MR.RGM+ methods. The upload includes two archives: 1) GTEx.zip This archive contains preprocessed genotype and gene expression matrices derived from the GTEx v7 project for muscle skeletal tissue. These files were prepared for direct use in the real-data analysis scripts provided in the associated code repository. 2) GTEx_Analysis_v7_eQTL.tar.gz This archive contains publicly available GTEx v7 eQTL summary files for muscle skeletal tissue, including significant variant–gene pairs and eGenes, downloaded from the GTEx Portal. The files in this record are provided to enable full reproducibility of the real-data analyses reported in the associated manuscript. Users can download and extract the archives and run the provided R scripts without additional preprocessing. Original GTEx data were generated by the GTEx Consortium. This record redistributes derived and reorganized data products for methodological reproducibility only.

  • GTEx v7 muscle skeletal tissue data used for MR.RGM real-data analysis

    Zenodo (CERN European Organization for Nuclear Research) · 2026-02-04

    datasetOpen accessSenior author

    This record contains the real-data resources used in the GTEx-based analyses of the MR.RGM and MR.RGM+ methods. The upload includes two archives: 1) GTEx.zip This archive contains preprocessed genotype and gene expression matrices derived from the GTEx v7 project for muscle skeletal tissue. These files were prepared for direct use in the real-data analysis scripts provided in the associated code repository. 2) GTEx_Analysis_v7_eQTL.tar.gz This archive contains publicly available GTEx v7 eQTL summary files for muscle skeletal tissue, including significant variant–gene pairs and eGenes, downloaded from the GTEx Portal. The files in this record are provided to enable full reproducibility of the real-data analyses reported in the associated manuscript. Users can download and extract the archives and run the provided R scripts without additional preprocessing. Original GTEx data were generated by the GTEx Consortium. This record redistributes derived and reorganized data products for methodological reproducibility only.

  • Preprocessed single-cell RNA-seq and donor data for MR-CCC: Bayesian Mendelian Randomization for Causal Cell-Cell Communication

    Open MIND · 2026-04-21

    datasetOpen accessSenior author

    Preprocessed data from the OneK1K cohort (Yazar et al., Science 2022) used in the MR-CCC analysis of causal cell-cell communication. B_T_NK_Monocytes.rda contains pseudo-bulk gene expression count matrices (genes × donors) for five cell types: B cells, CD4+ T cells, CD8+ T cells, NK cells, and monocytes. Each matrix was aggregated from single-cell counts per donor and library-size normalized. donor.rda contains donor-level metadata and genotype information, including: donor covariates (age, sex, ancestry principal components), a SNP genotype matrix (dosage-encoded), and GRanges objects for SNP and gene genomic coordinates used for cis-eQTL instrument construction. These files are intended for use with the MR-CCC analysis scripts available at: https://github.com/bitansa/MR-CCC Original OneK1K data: https://onek1k.org

  • Multi-omics insights into GBA1-associated Parkinson’s disease: interplay of genomics, transcriptomics, proteomics, and lipidomics

    Molecular Neurodegeneration · 2026-01-29 · 1 citations

    articleOpen access1st author

    Parkinson’s disease (PD) is the second most prevalent neurodegenerative disorder worldwide. The pathogenesis of PD is driven by multifactorial mechanisms involving a complex interplay among environmental exposures, genetic susceptibility, and aging-related processes. Among genetic contributors, heterozygous pathogenic variants in the GBA1 gene represent the most significant heritable risk factor for PD. The disease mechanisms of GBA1 defects in PD remains incompletely understood. It has been proposed that a partial loss-of-function of the lysosomal enzyme glucocerebrosidase, or potential toxic gain-of-function effects (e.g., endoplasmic reticulum stress) might contribute to the disease. These processes initiate a cascade of pathophysiological events, including dysregulated sphingolipid metabolism, compromised lysosomal-autophagic function, mitochondrial dysfunction, and accelerated α-synuclein aggregation. Subsequent dopaminergic neurodegeneration and sustained neuroinflammatory cascades ultimately drive PD progression. Nevertheless, the precise molecular mechanisms linking GBA1 mutations to PD pathogenesis remain incompletely elucidated, and clinically validated early diagnostic biomarkers for GBA1-associated PD (GBA1-PD) are still lacking. This review summarizes the distinct clinical phenotypes and mechanistic underpinnings of GBA1-PD, with particular emphasis on omics-derived stratification biomarkers (identified through genomics, transcriptomics, proteomics, and lipidomics approaches) coupled with neuroimaging signatures. This review advances our understanding of GBA1-mediated PD pathogenesis while providing a framework for developing precision diagnostic strategies and targeted therapeutic interventions addressing PD heterogeneity.

  • OneK1K B-cell dataset used for MR.RGM real-data analysis

    Zenodo (CERN European Organization for Nuclear Research) · 2026-02-05

    datasetOpen access

    This record contains the real-data resources used in the OneK1K-based analyses of the MR.RGM and MR.RGM+ methods. The upload includes one R data archive: donor_b_cell.rda This file contains derived and curated data objects for B-cell samples from the OneK1K project, including:- donor-level covariates (age, sex, genotype PCs),- genotype dosage matrices aligned to donor IDs,- gene-level RNA count matrices aggregated at the donor level, and- genomic annotations for variants and genes. The data have been preprocessed to enable direct use in the real-data analysis scripts provided in the associated code repository. The files in this record are provided to ensure full reproducibility of the OneK1K real-data analyses reported in the associated manuscript. Original OneK1K data were generated by the OneK1K Consortium. This record redistributes derived and reorganized data products for methodological reproducibility only.

  • PACKETCLIP: multi-modal embedding of network traffic and language for cybersecurity reasoning

    Frontiers in Artificial Intelligence · 2025-07-28 · 6 citations

    articleOpen access

    Traffic classification is vital for cybersecurity, yet encrypted traffic poses significant challenges. We introduce PACKETCLIP which is a multi-modal framework combining packet data with natural language semantics through contrastive pre-training and hierarchical Graph Neural Network (GNN) reasoning. PACKETCLIP integrates semantic reasoning with efficient classification, enabling robust detection of anomalies in encrypted network flows. By aligning textual descriptions with packet behaviors, PACKETCLIP offers enhanced interpretability, scalability, and practical applicability across diverse security scenarios. With a 95% mean AUC, an 11.6% improvement over baselines, and a 92% reduction in intrusion detection training parameters, it is ideally suited for real-time anomaly detection. By bridging advanced machine-learning techniques and practical cybersecurity needs, PACKETCLIP provides a foundation for scalable, efficient, and interpretable solutions to tackle encrypted traffic classification and network intrusion detection challenges in resource-constrained environments.

Recent grants

Frequent coauthors

  • Yuan Ji

    University of Chicago

    32 shared
  • Francesco C. Stingo

    University of Florence

    20 shared
  • Peter Müller

    GeoSphere Austria

    20 shared
  • Veerabhadran Baladandayuthapani

    University of Michigan–Ann Arbor

    19 shared
  • Ruijie Gong

    Anhui University

    17 shared
  • Suping Wang

    Shanxi Medical University

    14 shared
  • Yong Cai

    Shanghai Jiao Tong University

    12 shared
  • Jeffrey Pittman

    Virginia Tech

    10 shared

Labs

  • Statistics Department of Texas A&M UniversityPI

Education

  • PhD, Statistics

    Rice University

    2015
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Yang Ni

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup