Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Jiawen Hou

· Math FellowVerified

University of Minnesota · Mathematics

Active 2010–2026

h-index18
Citations1.5k
Papers8648 last 5y
Funding
See your match with Jiawen Hou — sign in to PhdFit.Sign in

Research topics

  • Medicine
  • Computer science
  • Chemistry
  • Internal medicine
  • Materials science

Selected publications

  • FolpsD: combining EFT and phenomenological approaches for joint power spectrum and bispectrum analyses

    ArXiv.org · 2026-04-10

    articleOpen access

    We present a theoretical model for the power spectrum and bispectrum of galaxy clustering that exploits the complementarity between small-scale power spectrum information and large-scale bispectrum measurements. We extend the FOLPS code by combining its one-loop EFT galaxy power spectrum with a tree-level galaxy bispectrum projected onto the tripolar spherical harmonics (Sugiyama) basis. To access additional small-scale information, we also consider a line-of-sight damping factor in both statistics, mirroring approaches commonly used in studies of redshift-space distortions. We test the model using DESI DR2 galaxy mocks. Even without damping, the joint analysis of the EFT power spectrum and bispectrum significantly improves constraints and reduces parameter degeneracies relative to power spectrum analyses alone. For LRG-like samples, including the damping further extends the range beyond $k\sim 0.3 \,h \text{Mpc}^{-1}$ in the power spectrum and $k \sim 0.24 \,h \text{Mpc}^{-1}$ in the bispectrum without introducing statistically significant parameter biases. This leads to up to $\sim 30\%$ tighter constraints on $A_s$ and $ω_{cdm}$. For low signal-to-noise tracers such as QSOs, however, the damping parameters are weakly constrained and can absorb noise fluctuations, leading to shifts in inferred parameters. Similar limitations may arise in models where cosmological information is encoded in power-spectrum shape features degenerate with the damping, such as scenarios with massive neutrinos. In contrast, for $w_0w_a$CDM we obtain $15\%$ and $21\%$ tighter constraints on $w_0$ and $w_a$, respectively, yielding a deviation from constant dark energy at slightly more than the $1σ$ level using full-shape information alone. The code is publicly available at https://github.com/cosmodesi/FolpsD

  • Estimate renal cell carcinoma recurrence rates using electronic health records

    ESMO Real World Data and Digital Oncology · 2026-04-14

    articleOpen access1st authorCorresponding

    Background: Lack of readily available recurrence data has limited the use of electronic health records (EHR) for risk assessment of cancer recurrence and optimal patient management. This study aims to derive high-quality EHR recurrence data and estimate recurrence rates in overall population and specific subgroups. Materials and methods: Using EHR data between 1 January 2000 and 1 September 2022, we developed a computational tool for automatically annotating the renal cell carcinoma (RCC) recurrence outcome and a natural language processing (NLP) tool for extracting key RCC characteristics. Using data constructed from stage I-III RCC patients who underwent nephrectomy at Mass General Brigham (2000-2022), we analyzed recurrence rates by TNM (tumor-node-metastasis) stage, grade, and histological subtype. Analyses were conducted from 1 September 2022 to 16 August 2024. Results: A total of 5603 patients whose EHR met the eligibility criteria were included in the study [3590 (64%) men, 2013 (36%) women; median age at baseline 62 years (range 36-87 years); 4225 (75%) non-Hispanic white, 1378 (25%) other race-ethnicity. Tumor stage was as follows: 3324 (59%) stage I, 778 (14%) stage II, and 128 (2%) stage III, 1373 (25%) missing stage information]. Among patients with TNM stage T1-3 N0M0 clear-cell RCC any grade, EHR-derived recurrences were indicative for true recurrence with area under the receiver operating characteristic curve (AUC) of 0.914 for 5-year recurrence status cross-validated against expert annotated gold standard recurrence times. The estimated overall 5-year recurrence rate was 11.1%. We observe a substantially higher recurrence risk for T3 group (48.8%) versus T1 (2.8%) or T2 (14.2%) and G4 group (45.3%) versus G1 (3.7%), G2 (6.8%), or G3 (18.9%). Conclusions: Our computational approach demonstrates that high-quality recurrence data can be reliably extracted from EHR systems, providing a scalable solution for real-world RCC risk determination. These tools enable health care systems to better identify high-risk patients and potentially guide personalized follow-up strategies and adjuvant treatment options.

  • Inferring Rheumatoid Arthritis Disease Activity Status From the Electronic Health Records Across Health Systems

    Arthritis & Rheumatology · 2026-04-13

    articleOpen access

    OBJECTIVE: Disease activity plays a central role in rheumatoid arthritis (RA) clinical studies. The inconsistent availability of data on disease activity in real-world electronic health records (EHRs) data has limited the ability to generate real-world evidence (RWE). This study aimed to develop and validate scalable machine learning (ML) models to infer RA disease activity from EHR data. METHODS: We used EHR data from Mass General Brigham (MGB) and the Department of Veterans Affairs (VA) linked with RA registries that prospectively collected the Disease Activity Score with 28-joint counts (DAS28). Features for the algorithm were extracted from the EHRs including structured data (eg, diagnosis codes and narrative data using natural language processing [NLP]). ML models were trained on the registry-collected DAS28. The performance of models trained within the same institution and across institutions was evaluated. To assess face validity, we estimated the association between inferred disease activity and major adverse cardiovascular events (MACEs) with stratified Cox models. RESULTS: We studied 1,105 MGB and 2,631 VA patients with RA. Models with structured data achieved an area under the receiver operating curve (AUC) of 0.68 to 0.70; models incorporating structured and NLP achieved higher performance (MGB, AUC = 0.843; VA, AUC = 0.833). Cross-institution validation demonstrated limited transportability of algorithms across sites (MGB→VA, AUC = 0.679; VA→MGB, AUC = 0.718). Within the same institution, inferred disease activity was significantly associated with increased risk for incident MACEs (MGB, hazard ratio [HR] = 1.12; VA, HR = 1.14). CONCLUSION: RA disease activity can be inferred at scale from within-institution EHR data, though cross-institution performance is limited. The inferred disease activity replicated known associations with MACEs, and the results support its use in future studies to generate RWE.

  • Antidiabetic Drug Associations With Heart Failure Outcomes: Real-World Evidence Study Using Electronic Health Records

    JMIR Diabetes · 2026-04-15

    articleOpen accessSenior author

    Background: Patients with type 2 diabetes mellitus (T2D) have a higher risk of cardiovascular disease, including heart failure (HF), leading to health care burden including hospitalization and mortality. Among multiple T2D therapies, there are inadequate head-to-head comparisons of their effects on HF in the real-world patient population. Objective: This study aims to compare the time-to-HF among patients treated with different T2D drugs following metformin. Methods: We conducted a retrospective data analysis on electronic health records of 5000 patients with T2D. The inclusion criteria were previous treatment with metformin and initiation of glucagon-like peptide-1 receptor agonists (GLP1 RAs), dipeptidyl peptidase-4 inhibitors (DPP4i), sulfonylureas, or insulin. We grouped patients by the mechanism of their subsequent therapies and focused on 2 pairs of comparisons classified by insulin resistance: sulfonylureas versus insulin (increased resistance) and GLP1 RA versus DPP4i (decreased resistance). The outcomes were 5-year HF status and the HF-free survival time, which was verified manually by examining clinical notes. We applied doubly robust causal estimation and accounted for confounding by adjusting for coded and natural language processing electronic health record features identified through medical knowledge networks. Results: The study included 939 patients, of whom 204 (21.7%) received insulin, 482 (51.3%) received sulfonylureas, 90 (9.6%) received GLP1 RA, and 163 (17.4%) received DPP4i. Patients in the sulfonylureas group had a significantly higher 5-year HF-free survival compared to the insulin group (survival ratio of insulin/sulfonylureas 0.902, 95% CI 0.840-0.976; P=.01). There was no significant difference between the DPP4i versus GLP1 RA group in 5-year HF-free survival (survival ratio of GLP1 RA/DPP4i was 0.953, 95% CI 0.849-1.067; P=.40). For the occurrence of a HF-related hospitalization within 5 years, there were no significant differences between the sulfonylureas and insulin groups (risk difference 0.057, 95% CI -0.011 to 0.132; P=.11), and between the GLP1 RA and DPP4i groups (risk difference 0.010, 95% CI -0.096 to 0.129). Conclusions: We evaluated real-world evidence on 2 head-to-head comparisons of second-line T2D therapies on 5-year HF outcomes. Patients on sulfonylureas were associated with lower 5-year HF risks than those treated with insulin when measured by risk ratio, but no significant difference was detected when measured by the risk difference. Limitations of this study included potentially inadequate adjustment of confounding in the observational study and a limited sample size with validated HF outcomes.

  • 62.6 GHz ScAlN solidly mounted acoustic resonators

    Applied Physics Letters · 2026-01-26

    articleOpen access

    We demonstrate a record-high 62.6 GHz solidly mounted acoustic resonator (SMR) incorporating a 67.6 nm scandium aluminum nitride (Sc0.3Al0.7N) piezoelectric layer on a 40 nm buried platinum (Pt) bottom electrode, positioned above an acoustic Bragg reflector composed of alternating SiO2 (28.2 nm) and Ta2O5 (24.3 nm) layers in 8.5 pairs. The Bragg reflector and piezoelectric stack above are designed to confine a third-order thickness-extensional bulk acoustic wave mode, while efficiently transducing with thickness-field excitation. The fabricated SMR exhibits an extracted piezoelectric coupling coefficient (k2) of 0.8% and a maximum Bode quality factor (Q) of 51 at 63 GHz, representing the highest operating frequency reported for an SMR to date. These results establish a pathway toward mmWave SMR devices for filters and resonators in next-generation RF front ends.

  • Dependence and risk spillover effects between clean energy stocks and related assets——an empirical study based on asymmetric W-TVP-VAR model

    Applied Economics · 2025-05-21 · 2 citations

    article1st author
  • Federated Adaptive Causal Estimation (FACE) of Target Treatment Effects

    Journal of the American Statistical Association · 2025-01-21 · 13 citations

    articleOpen access

    Federated learning of causal estimands may greatly improve estimation efficiency by leveraging data from multiple study sites, but robustness to heterogeneity and model misspecifications is vital for ensuring validity. We develop a Federated Adaptive Causal Estimation (FACE) framework to incorporate heterogeneous data from multiple sites to provide treatment effect estimation and inference for a flexibly specified target population of interest. FACE accounts for site-level heterogeneity in the distribution of covariates through density ratio weighting. To safely incorporate source sites and avoid negative transfer, we introduce an adaptive weighting procedure via a penalized regression, which achieves both consistency and optimal efficiency. Our strategy is communication-efficient and privacy-preserving, allowing participating sites to share summary statistics only once with other sites. We conduct both theoretical and numerical evaluations of FACE and apply it to conduct a comparative effectiveness study of BNT162b2 (Pfizer) and mRNA-1273 (Moderna) vaccines on COVID-19 outcomes in U.S. veterans using electronic health records from five VA regional sites. We show that compared to traditional methods, FACE meaningfully increases the precision of treatment effect estimates, with reductions in standard errors ranging from 26% to 67%.

  • Advancing the Use of Longitudinal Electronic Health Records: Tutorial for Uncovering Real-World Evidence in Chronic Disease Outcomes (Preprint)

    2025-01-28

    preprint

    <sec> <title>UNSTRUCTURED</title> Managing chronic diseases requires ongoing monitoring of disease activity and therapeutic responses to optimize treatment plans. With the growing availability of disease-modifying therapies, it is crucial to investigate comparative effectiveness and long-term outcomes beyond those available from randomized clinical trials. We introduce a comprehensive pipeline for generating reproducible and generalizable real-world evidence on disease outcomes by leveraging electronic health record data. The pipeline first generates scalable disease outcomes by linking electronic health record data with registry data containing a small sample of labeled outcomes. It then applies causal analysis using these scalable outcomes to evaluate therapies for chronic diseases. The implementation of the pipeline is illustrated in a case study based on multiple sclerosis. Our approach addresses challenges in real-world evidence generation for disease activity of chronic conditions, specifically the lack of direct observations on key outcomes and biases arising from imperfect or incomplete data. We present advanced machine learning techniques such as semisupervised and ensemble methods to impute missing outcome data, further incorporating steps for calibrated causal analyses and bias correction. </sec>

  • Inferring rheumatoid arthritis disease activity status from the electronic health records across health systems to enable real-world data studies

    medRxiv · 2025-11-17

    preprintOpen access

    Objective: Disease activity plays a central role in rheumatoid arthritis (RA) clinical studies. However, RA disease activity is inconsistently recorded in real-world electronic health records (EHR) data limiting the generation of real-world evidence (RWE). This study aimed to develop and validate scalable machine learning (ML) models to infer RA disease activity from EHR data. Methods: We conducted studies from EHR data from Mass General Brigham (MGB) and the Veterans Affairs (VA); both have RA registries with prospectively collected disease activity score 28 (DAS28). The features for the algorithm were extracted from the EHR including structured data, e.g., ICD codes and narrative data using natural language processing (NLP). Machine learning models were trained on the registry-collected DAS28.We tested within-institution trained model performance and across systems transportability. The association between inferred disease activity and major adverse cardiovascular events (MACE) was tested with stratified Cox models to test face-validity. Results: We studied 1105 MGB and 2631 VA RA patients. Models with structured data models achieved an AUC of 0.68-0.70; models incorporating structured and NLP achieved higher performance (AUC=0.843, MGB; 0.833, VA). Cross-site validation demonstrated reduced transportability (AUC=0.679, MGB→VA; 0.718, VA→MGB), due to differences in the important feature. Within institution, inferred disease activity was significantly associated with increased risk for incident MACE (MGB: HR=1.12; VA: HR=1.14). Conclusion: RA disease activity can be inferred at scale from within-institution EHR data, though cross-institution performance is limited. The inferred disease activity replicated association between RA and MACE and supports it's use in future studies to generate RWE.

  • Ocrelizumab versus Natalizumab in Relapsing-Remitting Multiple Sclerosis: A Registry-Linked Electronic Health Records Study

    medRxiv · 2025-12-02

    preprintOpen access

    BACKGROUND: Ocrelizumab and natalizumab are commonly prescribed high-effectiveness disease-modifying therapies (DMTs) for relapsing-remitting multiple sclerosis (RRMS). However, no randomized clinical trial and few real-world studies have directly compared their effectiveness in reducing disability progression. Subtype classification and disability status are critical for multiple sclerosis (MS) research, but these data are often missing in electronic health records (EHRs), limiting robust real-world evidence generation. OBJECTIVE: To compare the effectiveness of ocrelizumab and natalizumab in two-year rater-assessed disability progression among RRMS patients using longitudinal registry-linked EHR data. DESIGN: Retrospective cohort study. SETTING: A large healthcare system that includes both academic and community practices. PARTICIPANTS: Patients diagnosed with MS who initiated ocrelizumab or natalizumab between 2012 and 2020, with at least 6-month EHR data before treatment initiation and no prior exposure to other high-effectiveness DMTs. EXPOSURES: Treatment with ocrelizumab vs natalizumab. MEASUREMENTS: We developed an ensemble machine learning model to impute RRMS subtype and disability outcomes using structured and narrative EHR data. The primary outcome was moderate/severe rater-assessed disability at 2 years (observed or imputed Expanded Disability Status Scale [EDSS]≥4) after treatment initiation. We estimated the average treatment effects using semi-supervised doubly robust approach with comprehensive confounder adjustment and calibration to mitigate imputation bias. Covariates included standard demographic and clinical features such as baseline disability as well as knowledge graph-selected features. Sensitivity analyses used observed EDSS scores in registry-derived RRMS patients. Exploratory analyses included rituximab, another B-cell-depleting therapy, with adjustments for differences in patient profiles. RESULTS: Among RRMS patients, those treated with ocrelizumab (n=543) had a significantly lower two-year risk of moderate/severe disability compared with those treated with natalizumab (n=205) based on imputed outcomes (risk difference, -5.87%; 95% CI: -11.28% to -0.46%; p=0.033) after confounder adjustment. Sensitivity analyses yielded consistent findings using imputed or observed EDSS outcomes in registry-derived RRMS patients. CONCLUSION AND RELEVANCE: In this real-world comparative effectiveness study using a novel semi-supervised doubly-robust framework, ocrelizumab was associated with a lower risk of disability progression than natalizumab among RRMS patients. This approach provides a roadmap for generating robust large-scale real-world evidence in settings of missing key inclusion features and outcomes.

Frequent coauthors

  • Charles P. Lin

    Center for Systems Biology

    34 shared
  • Eric O. Potma

    University of California, Irvine

    32 shared
  • Bruce J. Tromberg

    28 shared
  • Tianxi Cai

    Harvard University

    19 shared
  • Jayaraj Rajagopal

    Bharathidasan University

    15 shared
  • Giuseppe Intini

    McGowan Institute for Regenerative Medicine

    15 shared
  • Mihaela Balu

    University of California, Irvine

    13 shared
  • Tianrun Cai

    Brigham and Women's Hospital

    13 shared

Education

  • Doctor of Philosophy in Math w/spec in Stat, Mathematics

    University of California San Diego

    2019
  • Master of Science in Statistics, Statistics

    University of Illinois at Urbana-Champaign

    2013
  • Bachelor of Mathematics, Mathematics

    Fudan University

    2011
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Jiawen Hou

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup