
Dabao Zhang
· Professor of Epidemiology & BiostatisticsVerifiedUniversity of California, Irvine · Epidemiology & Biostatistics
Active 2004–2026
About
Dabao Zhang is a Professor of Epidemiology and Biostatistics at UC Irvine Wen Public Health. His research and scholarship interests include statistical and computational methodology, such as the construction of large causal systems, exploratory analysis and visualization of big data, generalized linear (mixed) models, integrative analysis of big data, meta-analysis, multivariate extreme values, high-dimensional variable selection, supervised dimension reduction, survival analysis, and transfer learning. Additionally, he specializes in statistical genetics and bioinformatics, focusing on causal inference of transcriptome-wide gene regulatory networks, epistatic interactions, genetic heritability, gene-environment interactions, genome-wide association studies, genomic selection, molecular signature identification, integrative omics data analysis, Mendelian randomization, and pan-cancer analysis of gene regulatory networks. Professor Zhang holds a Ph.D. in Statistics from Cornell University, an M.Sc. in Probability & Statistics from Peking University, and a B.Sc. in Mathematical Statistics from Nankai University. His notable contributions include developing exploratory tools for visualizing relational structures among massive variables in big data, creating computational algorithms to infer biological causality between molecular variables and clinical phenotypes, and defining measures to address the explainability of AI models. His work also involves leveraging generative models to enhance statistical analysis of text data. His research has led to significant advancements in statistical methodology and bioinformatics, with applications across health sciences and genomics.
Research topics
- Computer Science
- Econometrics
- Economics
- Statistics
- Mathematics
- Finance
- Botany
- Genetics
- Biology
Selected publications
Structure-Aware Dynamic Fusion with Modality Balance for Multimodal KGC
Lecture notes in computer science · 2026-01-01
book-chapterSenior authorqshap: Fast Calculation of Feature Contributions in Boosting Trees
2026-03-16
datasetOpen accessSenior authorComputes feature-specific R-squared (R2) contributions for boosting tree models using a Shapley-value-based decomposition of the total R-squared in polynomial time. Supports models fitted with 'XGBoost' and 'LightGBM', and provides efficient parallel implementations suitable for large-scale problems. Multiple visualization tools are included for interpreting and communicating feature contributions. The methodology is described in Jiang, Zhang, and Zhang (2025) <<a href="https://doi.org/10.48550%2FarXiv.2407.03515" target="_top">doi:10.48550/arXiv.2407.03515</a>>.
Cancer Research · 2026-04-03
articleSenior authorAbstract Background: Prostate cancer remains the most common malignancy among men and a leading cause of cancer-related death worldwide, with an estimated 313,780 new cases expected in 2025. Despite recent advances in screening and treatment, the molecular mechanisms driving prostate cancer are not fully understood. The growing availability of multi-omics data from the same patients provides an unprecedented opportunity to reveal disease pathways and identify novel therapeutic targets. However, integrating these large-scale, multi-modal, and heterogeneous omics profiles poses substantial statistical and computational challenges. Methods: Transcriptomic and genomic data were obtained from prostate tumor samples in GEO (GSE70768). After pre-processing and quality control, the dataset included 17,426 genes and 272,564 single nucleotide polymorphisms (SNPs) from 90 patients. We applied SIGNET to identify instrumental variables (IVs), yielding 7,806 gene-IV pairs for 3,309 genes. Using these IVs, we then applied SIGNET to conduct causal inference and construct transcriptome-wide gene regulatory networks for prostate cancer based on the integrated genomic and transcriptomic data, supported by 100 bootstrap datasets. Results: We identified 1,840 gene regulations that were repeatedly recovered in ≥80% of the bootstrap datasets, of which 369 appeared in ≥95% of the constructions. Within these robust subnetworks, we detected hub genes including DDX51, PNPT1, FARSLB, and IFI6. Using data from the Cancer Genome Atlas (TCGA) project, we validated that IFI6 is highly correlated with its predicted targets (correlation coefficient 0.52 ∼ 0.90, p &lt; 0.01). IFI6 is a negative regulator of innate immunity and has been reported to be overexpressed in multiple cancers, with emerging evidence supporting its role in tumorigenesis and drug resistance. Our findings nominate IFI6 as a candidate regulator in prostate cancer, warranting further functional studies to define its role in tumor proliferation, metastasis, therapy responses, and the immune tumor microenvironment. Finally, Ingenuity Pathway Analysis (IPA) of top subnetworks with high bootstrap frequency highlighted several significant pathways, including primary immunodeficiency signaling and communication between innate and adaptive immune cells. Conclusion: Using multi-omics data from prostate cancer tissues coupled with transcriptome-wide causal inference, our data-driven detection of regulator-target pairs provides new insights into the molecular mechanisms of prostate cancer and may ultimately facilitate the development of personalized treatment strategies. Citation Format: Min Zhang, Zhongli Jiang, Xiaolin Zi, Danni Liu, Yan Li, Dabao Zhang, . Understanding molecular mechanisms of prostate cancer via transcriptome-wide causal gene regulatory networks [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 6868.
From Correlation to Causation: Cell-Type-Specific Gene Regulatory Networks in Alzheimer’s Disease
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-25
preprintOpen accessINTRODUCTION: Alzheimer's disease (AD) involves complex regulatory disruptions across multiple brain cell types, yet a comprehensive understanding of the intracellular causal mechanisms remains unclear. METHODS: We presented an integrative analysis framework using single-nucleus transcriptomic with matched subject-level genotype data from 272 human AD in the Religious Orders Study and the Rush Memory and Aging Project (ROSMAP) study, and constructed causality-based, cell-type-specific gene regulatory networks (GRNs). RESULTS: Our method identifies regulatory genes from both transcription factors (TFs) and non-TFs, thereby capturing a complete and accurate causal regulatory map across different brain cell types. This work revealed both established and novel regulations, pathways, and cell-type-unique hub genes in AD. Beyond constructing transcriptome-wide GRNs, we quantitatively assessed hub genes and distinguished those with regulatory or responsive roles. DISCUSSION: Our study provides a comprehensive mapping of cell-type-specific causal GRNs in AD, providing a powerful resource for dynamic pathway exploration, hypothesis generation, and functional interpretation.
Frontiers in Materials · 2025-03-20 · 1 citations
articleOpen accessAn Ag-20 vol.% V 2 AlC composite material was prepared using the spark plasma sintering method. The influence of the number of arc discharge on the electrical contact performance of Ag-V 2 AlC composites was systematically investigated. For the first time, we observed that the arc ablation mechanism evolves with increasing discharge cycles. During single arc ablation, the arc preferentially discharges the Ag phase owing to its lower work function. This process creates a relatively flat ablation region where the V 2 AlC reinforcement and Ag matrix remain distinct. The V 2 AlC phase acts as a pinning agent within the Ag matrix, effectively suppressing material splatter. After 10 discharge cycles, the ablation edge of the Ag-V 2 AlC material develops a mountain-like morphology. This structure prevents material splashing and results in no pores or splatter on the surface. The phase boundary between V 2 AlC and Ag becomes less distinct, while the breakdown current stabilizes between 19.9 A and 24.1 A. Concurrently, the breakdown strength fluctuates within 4.3 × 10 6 V/m to 8.2 × 10 6 V/m. Following 100 discharge cycles, the Ag and V 2 AlC phases are no longer distinguishable in the ablation area. Micro-protrusions form in the central ablation region, enhancing the local electric field and ultimately reducing the breakdown strength. As discharges increase further, the concentration of low-work-function oxides (V 2 O 5 , Al 2 O 3 , and Ag 2 O) rises. These oxides dominate the arc discharge process, further diminishing the breakdown strength. Consequently, the breakdown strength exhibits a three-stage decreasing trend. Although the ablation area expands with discharge cycles, oxide formation increases the molten pool viscosity, preventing significant splatter at the ablation edge. These findings provide a theoretical foundation for designing novel electrical contact materials with enhanced performance.
Exploring Massive Risk Factors of Categorical Outcomes via Supervised Dimension Reduction
Journal of Data Science · 2025-01-01 · 1 citations
articleOpen accessSenior authorWe propose to explore high-dimensional data with categorical outcomes by generalizing the penalized orthogonal-components regression method (POCRE), a supervised dimension reduction method initially proposed for high-dimensional linear regression. This generalized POCRE, i.e., gPOCRE, sequentially builds up orthogonal components by selecting predictors which maximally explain the variation of the response variables. Therefore, gPOCRE simultaneously selects significant predictors and reduces dimensions by constructing linear components of these selected predictors for a high-dimensional generalized linear model. For multiple categorical outcomes, gPOCRE can also construct common components shared by all outcomes to improve the power of selecting variables shared by multiple outcomes. Both simulation studies and real data analysis are carried out to illustrate the performance of gPOCRE.
Characterizing Autism Spectrum Disorder in the All of Us Research Program
medRxiv · 2025-09-12
preprintOpen accessCorrespondingAbstract Many autism spectrum disorder (ASD) studies have been established for early diagnosis in youth, which make it difficult to investigate issues arising in adults, such as heterogeneity in symptoms and the timing of clinical manifestations. The All of Us Research Program is a nationwide precision medicine initiative that provides an important cohort consisting of 393,596 participants with both health records, survey data, and whole genome sequencing data, making it a unique resource to study ASD adults and other complex diseases such as cancer. In this study, we identified 1,049 ASD cases and 22,777 non-autistic controls from the All of Us cohort. Autistic adults presented a high rate (over 90%) of co-occurring psychiatric conditions, with over 60% receiving a delayed ASD diagnosis after other mental disorders. Comparison analysis of the 739 matched cases and controls showed statistically significant differences in various sociodemographic factors, including marriage, employment rates, and annual income.
Prediction Interval Transfer Learning for Linear Regression Using an Empirical Bayes Approach
Stat · 2025-01-16 · 1 citations
articleOpen accessSenior authorCorrespondingABSTRACT Current literature on transfer learning has been focused on improving the predictive performance corresponding to a small dataset by transferring information to it from a larger but possibly biassed dataset. However, the transfer learning methods currently available do not allow the computation of prediction intervals, and hence, one has to rely on using either the small dataset alone or combining it with the possibly biassed dataset to obtain prediction intervals using traditional linear regression methods. In this article, we propose an E mpirical B ayes approach for P rediction I nterval T ransfer L earning (EB‐PITL), to compute prediction intervals for transfer learning in linear regression tasks. We have proved that the Gibbs sampler associated with EB‐PITL is geometrically ergodic, so EB‐PITL can also quantify the Monte Carlo uncertainty associated with its predicted value. The efficiency of EB‐PITL against currently available methods is demonstrated using simulation studies and by analysing the Tetouan City power consumption dataset.
Uniform Inference for Central and Tail Distributions with Censored Data
SSRN Electronic Journal · 2024-01-01
preprintOpen accessRefining Kaplan-Meier Estimation with the Generalized Pareto Model for Survival Analysis
SSRN Electronic Journal · 2024-01-01 · 1 citations
preprintOpen access
Recent grants
CAREER: A New Regularization Framework for Identifying Composite Signatures
NSF · $433k · 2009–2014
Modeling Homeostasis of Human Blood Metabolites
NIH · $2.2M · 2020–2025
Measuring Explained Variation in Survival Analysis
NIH · $143k · 2019–2024
Frequent coauthors
- 23 shared
Martin T. Wells
Cornell University
- 22 shared
Min Zhang
Zhengzhou People's Hospital
- 16 shared
Patricia A. Cassano
- 16 shared
Bruce W. Turnbull
University of Leeds
- 16 shared
David Sparrow
VA Boston Healthcare System
- 14 shared
Vitara Pungpapong
- 10 shared
Min Zhang
Kunming Medical University
- 9 shared
Daniel Raftery
University of Washington
Awards & honors
- Purdue University College of Science Outstanding Service Awa…
- Purdue University Seed for Success Award 2011, 2020
- National Science Foundation CAREER Award 2009
- Purdue University College of Science Interdisciplinary Award…
- Cornell University Liu Memorial Award 2003
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Dabao Zhang
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup