
Ruoqing Zhu
· Associate ProfessorVerifiedUniversity of Illinois Urbana-Champaign · Statistics
Active 2008–2026
About
Ruoqing Zhu is an Associate Professor in the Department of Statistics at the University of Illinois Urbana-Champaign and serves as the PhD Program Director. He completed his Ph.D. in Biostatistics from the University of North Carolina at Chapel Hill in 2013 and has since held positions including Postdoctoral Associate at Yale University and faculty roles at UIUC. His research interests encompass developing statistical methodology, theory, and computational algorithms for decision-making problems, with a focus on personalized medicine and reinforcement learning. He aims to address issues such as unrealistic model assumptions, unstable performance, interpretability challenges, and complexities arising from small sample sizes, high dimensionality, and complex data structures. His recent work emphasizes uncertainty quantification, distributional shift, causal inference, and trustworthiness to develop reliable solutions for real-world applications. Zhu also has a strong interest in classical statistical learning and machine learning methods, including random forests, sufficient dimension reduction, and survival analysis, with applications in bioinformatics, infectious diseases, and nutrition studies.
Research topics
- Biology
- Computer Science
- Medicine
- Internal medicine
- Pathology
- Machine Learning
- Biochemistry
- Cancer research
- Gastroenterology
- Immunology
- Cell biology
- Econometrics
- Mathematics
- Psychology
- Food science
- Statistics
- Bioinformatics
- Ecology
- Engineering
- Oncology
- Andrology
Selected publications
Rectified Fisher-Bingham Model for Compositional Data with Zeros
arXiv (Cornell University) · 2026-04-27
preprintOpen accessSenior authorThis paper introduces a rectified and renormalized Fisher-Bingham model for compositional data with zeros, motivated in part by the presence of zeros in microbiota studies. The approach represents compositions through a square-root transformation that maps data to the positive orthant of the unit sphere, and models them via a latent Fisher-Bingham followed by a deterministic transformation that induces exact zeros. This construction yields a coherent likelihood without requiring zero imputation or separate modeling of zero and nonzero components. Parameter estimation is performed using a Monte Carlo expectation-maximization algorithm that accommodates the latent structure. We further develop a score test for detecting structured differences in composition across groups, providing a parametric alternative to commonly used distance-based methods. Simulation studies demonstrate that the proposed method closely approximates the induced distribution and achieves higher power for detecting structured compositional changes, particularly when observations include many zero-valued components. An application to a dietary intervention study illustrates that the method identifies meaningful microbiota shifts not detected by standard approaches.
Rectified Fisher-Bingham Model for Compositional Data with Zeros
ArXiv.org · 2026-04-27
articleOpen accessSenior authorThis paper introduces a rectified and renormalized Fisher-Bingham model for compositional data with zeros, motivated in part by the presence of zeros in microbiota studies. The approach represents compositions through a square-root transformation that maps data to the positive orthant of the unit sphere, and models them via a latent Fisher-Bingham followed by a deterministic transformation that induces exact zeros. This construction yields a coherent likelihood without requiring zero imputation or separate modeling of zero and nonzero components. Parameter estimation is performed using a Monte Carlo expectation-maximization algorithm that accommodates the latent structure. We further develop a score test for detecting structured differences in composition across groups, providing a parametric alternative to commonly used distance-based methods. Simulation studies demonstrate that the proposed method closely approximates the induced distribution and achieves higher power for detecting structured compositional changes, particularly when observations include many zero-valued components. An application to a dietary intervention study illustrates that the method identifies meaningful microbiota shifts not detected by standard approaches.
medRxiv · 2025-10-05
preprintOpen accessBACKGROUND Early identification of patients at risk for sepsis, mortality, and clinical deterioration is essential for improving outcomes, but existing diagnostic and predictive tools have limited accuracy. The objective was to evaluate the performance of an FDA-authorized AI tool, the Sepsis ImmunoScore, compared to widely available biomarkers and clinical tools for diagnosis of sepsis and prediction of in-hospital mortality and intensive care unit (ICU) admission. METHODS This multicenter observational study included 6,027 adult patients suspected of infection across 7 U.S. hospital sites. The Sepsis ImmunoScore’s predictive performance was compared to the sequential organ failure assessment (SOFA) score, procalcitonin (PCT), C-reactive protein (CRP), Systemic Inflammatory Response Syndrome (SIRS) score, National Early Warning Score (NEWS), and quick SOFA (qSOFA). Primary outcomes included sepsis as defined by Sepsis-3 criteria, in-hospital mortality, and ICU admission. Predictive accuracy was assessed using area under the receiver operating characteristic curve (AUC), and 95% confidence intervals were generated and hypothesis testing conducted using the bootstrap method. RESULTS The Sepsis ImmunoScore demonstrated statistically significant superior performance across all outcomes. For sepsis prediction, the Sepsis ImmunoScore achieved an AUC of 0.82, compared to SOFA (0.72), procalcitonin (PCT) (0.70),C-reactive protein (CRP) (0.61), SIRS (0.59), NEWS (0.69), and qSOFA (0.67). For in-hospital mortality prediction, the Sepsis ImmunoScore achieved an AUC of 0.80, outperforming SOFA (0.72), PCT (0.67), CRP (0.58), SIRS (0.60), NEWS (0.72), and qSOFA (0.69). For ICU admission, the Sepsis ImmunoScore reached an AUC of 0.74, superior to SOFA (0.63), PCT (0.64), CRP (0.54), SIRS (0.60), NEWS (0.70), and qSOFA (0.65). All differences between the Sepsis ImmunoScore and comparators were statistically significant. CONCLUSIONS The Sepsis ImmunoScore significantly improved predictive accuracy for sepsis, in-hospital mortality, and ICU admission compared to six conventional clinical scores and biomarkers. This AI-based tool may enhance risk stratification and clinical decision-making, potentially leading to more timely sepsis interventions and improved outcomes. KEY POINTS Question How does the FDA-authorized Sepsis ImmunoScore compare to conventional sepsis tools at diagnosing and predicting sepsis, clinical deterioration, and in-hospital mortality? Findings In a multicenter observational cohort of 6,027 patients with suspected infection, the Sepsis ImmunoScore demonstrated statistically superior performance compared to PCT, CRP, SOFA, qSOFA, SIRS, and NEWS in predicting all outcomes: sepsis diagnosis, ICU admission, and in-hospital mortality. Meaning Because the Sepsis ImmunoScore outperforms existing sepsis diagnostics, it could potentially enhance risk stratification and clinical decision-making for patients with suspected infection, enabling more appropriate and timely interventions.
Probabilistic exponential family inverse regression and its applications
Biometrics · 2025-04-02
articleRapid advances in high-throughput sequencing technologies have led to the fast accumulation of high-dimensional data, which is harnessed for understanding the implications of various factors on human disease and health. While dimension reduction plays an essential role in high-dimensional regression and classification, existing methods often require the predictors to be continuous, making them unsuitable for discrete data, such as presence-absence records of species in community ecology and sequencing reads in single-cell studies. To identify and estimate sufficient reductions in regressions with discrete predictors, we introduce probabilistic exponential family inverse regression (PrEFIR), assuming that, given the response and a set of latent factors, the predictors follow one-parameter exponential families. We show that the low-dimensional reductions result not only from the response variable but also from the latent factors. We further extend the latent factor modeling framework to the double exponential family by including an additional parameter to account for the dispersion. This versatile framework encompasses regressions with all categorical or a mixture of categorical and continuous predictors. We propose the method of maximum hierarchical likelihood for estimation, and develop a highly parallelizable algorithm for its computation. The effectiveness of PrEFIR is demonstrated through simulation studies and real data examples.
Integrating Prior Knowledge From Genome-Scale Metabolic Model With Metabolomics for Diet Assessment
IEEE Transactions on Computational Biology and Bioinformatics · 2025-04-15
articleOpen accessDietary biomarker metabolite detection is frequently studied but lacks insight into underlying biomechanism and suffers empirically from small cohorts of feeding trials. Our earlier work engineered 3 novel features to integrate prior knowledge from a genome-scale metabolic model with metabolomes to suggest diet-relevant underlying metabolic reactions and subsystems and improve predictive modeling. This study extends our earlier work by inspecting the impact of using reaction and subsystem features together, the effect of prior knowledge volume on diet assessment, and the robustness of proposed features for multi-diet assessment. We also propose a new feature in this work. We notice several experimental settings perform better with reaction and subsystem features together. We see that diet assessment can improve with higher volumes of prior, but the volume often becomes irrelevant as long as some amount of prior is used. We show our features generalize well for multi-diet assessment.
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Journal of the American Statistical Association · 2025-12-02
articleSenior authorCorrespondingThis article addresses the challenge of offline policy learning in continuous action spaces when unmeasured confounders are present. While most existing research focuses on policy evaluation within partially observable Markov decision processes (POMDPs) and assumes discrete action spaces, we advance this field by establishing a novel identification result to enable the nonparametric estimation of policy value for a given target policy under an infinite-horizon framework. Leveraging this identification, we develop a minimax estimator and introduce a policy-gradient-based algorithm to identify the in-class optimal policy that maximizes the estimated policy value. Furthermore, we provide theoretical results regarding the consistency, finite-sample error bound, and regret bound of the resulting optimal policy. Extensive simulations and a real-world application using the German Family Panel data demonstrate the effectiveness of our proposed methodology. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
Journal of the Heart Valve Society · 2025-04-01
articleBackground Myxomatous mitral valve disease (MMVD) is a degenerative disorder marked by excess tissue fibrosis and matrix remodeling, leading to leaflet prolapse. While recent studies linked transforming growth factor beta (TGF-β) activation to MMVD development and progression, upstream regulators of this and other modulatory/causal pathways remain largely unexplored. Objectives In this study, we utilized high-throughput sequencing to conduct unbiased analyses of mRNA and microRNA (miRNA) levels in myxomatous and healthy mitral valve tissues, aiming to uncover novel molecular mechanisms involved in MMVD. Methods We defined differentially expressed mRNAs and miRNAs transcripts displaying a fold-change > 1.5 and P < .05. Pathway analysis was performed using Ingenuity Pathway Analysis and DAVID, with key findings validated via qRT-PCR. A total of 2378 transcripts were differentially expressed between myxomatous and normal valves. Established pathways, including TGF-β signaling, were confirmed as major contributors to MMVD. Additionally, Random Forests with Boruta Feature Selection identified transcripts with a 95% likelihood of importance, and subsequent pathway analysis on this subset of genes revealed unique signaling pathways. Results Most notably, circadian rhythm disruption emerged as a novel, highly ranked pathway in MMVD. Key miRNAs, such as miR-1, miR-133a, and miR-217 were highlighted as highly relevant, with miRNA–mRNA interactions displaying distinct molecular signatures predictive of MMVD. Conclusions Collectively, this study represents the first comprehensive analysis of both miRNA and mRNA expression in MMVD, revealing both established and novel disease-associated pathways. The discovery of circadian rhythm disturbances and new regulatory miRNAs suggests promising directions for further research and potential therapeutic targets for nonsurgical treatment strategies in patients with MMVD.
Predicting Cognitive Outcome Through Nutrition and Health Markers Using Supervised Machine Learning
Journal of Nutrition · 2025-05-12 · 2 citations
articleOpen accessBACKGROUND: Machine learning (ML) use in health research is growing, yet its application to predict cognitive outcomes using diverse health indicators is underinvestigated. OBJECTIVES: We used ML models to predict cognitive performance based on a set of health and behavioral factors, aiming to identify key contributors to cognitive function for insights into potential personalized interventions. METHODS: Data from 374 adults aged 19-82 y (227 females) were used to develop ML models predicting cognitive performance (reaction time in milliseconds) on a modified Eriksen flanker task. Features included demographics, anthropometric measures, dietary indices (Healthy Eating Index, Dietary Approaches to Stop Hypertension, Mediterranean, and Mediterranean-Dietary Approaches to Stop Hypertension Intervention for Neurodegenerative Delay), self-reported physical activity, and systolic and diastolic blood pressures. The data set was split (80:20) for training and testing. Predictive models (decision trees, random forest, AdaBoost, XGBoost, gradient boosting, linear, ridge, and lasso regression) were used with hyperparameter tuning and crossvalidation. Feature importance was calculated using permutation importance, whereas performance using mean absolute error (MAE) and mean squared error. RESULTS: ). Age was the most significant feature (score: 0.208), followed by diastolic blood pressure (0.169), BMI (0.079), systolic blood pressure (0.069), and Healthy Eating Index (0.048). Ethnicity (0.005) and sex (0.003) had minimal predictive effect. CONCLUSIONS: Age, blood pressure, and BMI show strong associations with cognitive performance, whereas diet quality has a subtler effect. These findings highlight the potential of ML models for developing personalized interventions and preventive strategies for cognitive decline.
Science Translational Medicine · 2025-08-06 · 6 citations
articleOpen accessEach year in the United States, ~50% of adults ≥18 years old are vaccinated against influenza viruses, with protective efficacy averaging 40.5% over the past 20 years. To model annual seasonal influenza, a cohort of 74 adults, who were unscreened for preexisting A/H1N1 immunity and half of whom were recently immunized with licensed QIV (mean of 64 days), were challenged with A/H1N1 influenza virus. Transcriptomic, proteomic, and VDJ repertoire analyses were performed on nasal and peripheral blood samples from participants to identify nasal mucosal and systemic immune responses that correlated with viral shedding and immune correlates of protection. Viral-shedding participants showed increased T cell, but not B cell, VDJ diversity with expansion of low-frequency B cell clones postchallenge, including broadly neutralizing motifs. Nonshedding participants demonstrated decreased clonality and increased richness of B and T cell VDJ clones, increased preinoculation nasal mucosal immune gene and serum protein expression, and increased ex vivo peripheral blood mononuclear cell responses. Nasal mucosal responses in participants shedding virus for 2 or more days showed higher early viral loads and exhibited stronger induction of antiviral responses compared with those in participants who shed virus for 1 day. Last, participants with a single day of viral shedding were three times more likely to be female. These data shed light on the complex immune responses in the nasal mucosa and the periphery after influenza vaccination and infection, which will be critical for next-generation vaccine development.
PLoS ONE · 2025-04-09 · 4 citations
articleOpen accessActive tuberculosis (TB) is caused by Mycobacterium tuberculosis (Mtb) bacteria and is characterized by multiple phases of infection, leading to difficulty in diagnosing and treating infected individuals. Patients with latent tuberculosis infection (LTBI) can reactivate to the active phase of infection following perturbation of the dynamic bacterial and immunological equilibrium, which can potentially lead to further Mtb transmission. However, current diagnostics often lack specificity for LTBI and do not inform on TB reactivation risk. We hypothesized that immune profiling readily available QuantiFERON-TB Gold Plus (QFT) plasma supernatant samples could improve LTBI diagnostics and infer risk of TB reactivation. We applied a whispering gallery mode, silicon photonic microring resonator biosensor platform to simultaneously quantify thirteen host proteins in QFT-stimulated plasma samples. Using machine learning algorithms, the biomarker concentrations were used to classify patients into relevant clinical bins for LTBI diagnosis or TB reactivation risk based on clinical evaluation at the time of sample collection. We report accuracies of over 90% for stratifying LTBI + from LTBI- patients and accuracies reaching over 80% for classifying LTBI + patients as being at high or low risk of reactivation. Our results suggest a strong reliance on a subset of biomarkers from the multiplexed assay, specifically IP-10 for LTBI classification and IL-10 and IL-2 for TB reactivation risk assessment. Taken together, this work introduces a 45-minute, multiplexed biomarker assay into the current TB diagnostic workflow and provides a single method capable of classifying patients by LTBI status and TB reactivation risk, which has the potential to improve diagnostic evaluations, personalize treatment and management plans, and optimize targeted preventive strategies in Mtb infections.
Frequent coauthors
- 51 shared
Marvin S. Swartz
American Society of Law, Medicine and Ethics
- 51 shared
Alan R. Ellis
North Carolina State University
- 51 shared
Kristen Hassmiller Lich
- 51 shared
Elizabeth M. La
- 51 shared
J Morrissey
- 49 shared
Rebecca Wells
St George's, University of London
- 20 shared
Wenzhuo Zhou
- 19 shared
David J. Baer
Labs
Not provided
Education
- 2013
Ph.D, Biostatistics
University of North Carolina at Chapel Hill
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ruoqing Zhu
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup