
Maya Petersen
· MD, PhD Professor, Epidemiology and BiostatisticsVerifiedUniversity of California, Berkeley · Biostatistics
Active 2002–2025
About
Maya L. Petersen, MD, PhD, is a Professor of Biostatistics and Epidemiology at UC Berkeley and of Computational Precision Health at UCSF. She is the co-Director of the UCSF-UC Berkeley Joint Program in Computational Precision Health and the co-Director of UC Berkeley’s Center for Targeted Machine Learning and Causal Inference. Her methodological research focuses on the intersection of AI, statistical inference, and causal inference, emphasizing complex observational and experimental data, individualized treatment strategies, and adaptive study designs. Dr. Petersen uses these methods to improve healthcare delivery globally and domestically, leading randomized trials and observational evaluations to deploy insights and measure their impact. Her work aims to develop and apply novel causal inference methods to problems in health, community-based interventions, and HIV treatment and prevention.
Research topics
- Medicine
- Environmental health
- Social Science
- Immunology
- Sociology
- Virology
- Genetics
- Socioeconomics
- Nursing
- Geography
- Psychology
- Family medicine
- Internal medicine
- Pathology
- Biology
Selected publications
Statistics in Medicine · 2025-12-01
articleOpen accessSenior authorWe propose a novel causal estimand that elucidates how response to an earlier treatment (e.g., treatment initiation) modifies the effect of a later treatment (e.g., treatment discontinuation), thus learning if there are effects among the (un)affected. Specifically, we consider a working marginal structural model summarizing how the average effect of a later treatment varies as a function of the (estimated) conditional average effect of an earlier treatment. We define the estimand to be a data-adaptive causal parameter, allowing for estimation of the conditional average treatment effect using machine learning without making strong smoothness assumptions. We show how a sequentially randomized design can be used to identify this causal estimand, and we describe a targeted maximum likelihood estimator for the resulting statistical estimand, with influence curve-based inference. We present simulation studies that evaluate the performance of this estimator under various finite-sample scenarios. Throughout, we use the "Adaptive Strategies for Preventing and Treating Lapses of Retention in HIV Care" trial (NCT02338739) as an illustrative example, showing that discontinuation of conditional cash transfers for HIV care adherence was most harmful among those who had an increase in benefit from them initially.
medRxiv · 2025-08-05
preprintOpen accessAbstract Alcohol-serving venues are high-risk sites for HIV transmission in East Africa. Understanding how venue characteristics influence HIV screening outcomes may help to target venue-based outreach. In eight rural communities in Kenya and Uganda, we mapped all alcohol-serving venues (N=530) and invited owners to participate in a cluster-randomized trial to promote biomedical HIV prevention uptake: 527 (99%) owners agreed to participate. We distributed cards recruiting adults ( ≥ 18 years) for free HIV testing with rapid initiation of HIV biomedical prevention or treatment. We characterized the yield of venue-based recruitment and evaluated venue-level correlates of being newly diagnosed with HIV, being previously diagnosed but out-of-care, and having self-reported HIV risk. Of 480 participating venues (Kenya=89, Uganda=391; 41 closed pre-recruitment and 6 had no one present), 61 (13%) had rooms for sex work; 91 (19%) offered condoms, and the median patrons/venue was 10/weekend-day. Staff distributed 9,375 cards and 7,744 (83%) adults participated in HIV screening. Of those screened, the median age was 34 years (IQR:26-43), 62% were men, and 1,620 (21%) had HIV. Among persons without known HIV, 141/6,265 (2.3%) were newly-diagnosed with HIV. Among persons with known HIV, 78/1,479 (5.3%) were out-of-care. Among persons without HIV, 2,285/6,124 (37%) reported HIV risk. The odds of having newly-diagnosed HIV increased significantly with each additional patron/weekend-day at a given venue (adjusted odds ratio [aOR]=1.03, 95%CI:1.00-1.05, p=0.025). The odds of being previously diagnosed but out-of-care were significantly lower among attendees at venues with condoms on site (aOR=0.39, 95%CI:0.16-0.99, p=0.047). The odds of reporting HIV risk were significantly higher among attendees at venues with condoms (aOR=1.25, 95%CI:1.04-1.49, p=0.015), more patrons/weekday (aOR=1.01, 95%CI:1.00-1.02, p=0.022), and more barmaids (aOR=1.07, 95%CI:1.01-1.13, p=0.013). Alcohol-serving venue characteristics were predictive of the yield of persons with untreated HIV or high HIV risk, and could aid programs in targeting venues for HIV prevention and treatment.
Journal of Causal Inference · 2025-01-01
articleOpen accessAbstract Augmenting a randomized controlled trial (RCT) with external data may increase power at the risk of introducing bias. To select and analyze the experiment (RCT alone or combined with external data) with the optimal bias-variance tradeoff, we develop a novel experiment-selector cross-validated targeted maximum likelihood estimator for randomized-external data studies (ES-CVTMLE). This estimator utilizes two estimates of bias to determine whether to integrate external data based on (1) a function of the difference in conditional mean outcome under control between the RCT and combined experiments and (2) an estimate of the average treatment effect on a negative control outcome. We define the asymptotic distribution of the ES-CVTMLE under varying magnitudes of bias and construct confidence intervals by Monte Carlo simulation. We evaluate ES-CVTMLE compared to three other data fusion estimators in simulations and demonstrate the ability of ES-CVTMLE to distinguish biased from unbiased external controls in a real data analysis of the effect of liraglutide on glycemic control from the LEADER trial. The ES-CVTMLE has the potential to improve power while providing relatively robust inference for future hybrid RCT-external data studies.
Targeted maximum likelihood based estimation for longitudinal mediation analysis
Journal of Causal Inference · 2025-01-01
articleOpen accessAbstract Causal mediation analysis with random interventions has become an area of significant interest for understanding time-varying effects with longitudinal and survival outcomes. To tackle causal and statistical challenges due to the complex longitudinal data structure with time-varying confounders, competing risks, and informative censoring, there exists a general desire to combine machine learning techniques and semiparametric theory. In this article, we focus on targeted maximum likelihood estimation (TMLE) of longitudinal natural direct and indirect effects defined with random interventions. The proposed estimators are multiply robust, locally efficient, and directly estimate and update the conditional densities that factorize data likelihoods. We utilize the highly adaptive lasso (HAL) and projection representations to derive new estimators (HAL-EIC) of the efficient influence curves (EICs) of longitudinal mediation problems and propose a fast one-step TMLE algorithm using HAL-EIC while preserving the asymptotic properties. The proposed method can be generalized for other longitudinal causal parameters that are smooth functions of data likelihoods, and thereby provides a novel and flexible statistical toolbox.
ArXiv.org · 2025-12-15 · 1 citations
preprintOpen accessSenior authorCovariate adjustment is an approach to improve the precision of trial analyses by adjusting for baseline variables that are prognostic of the primary endpoint. Motivated by the SEARCH Universal HIV Test-and-Treat Trial (2013-2017), we tell our story of developing, evaluating, and implementing a machine learning-based approach for covariate adjustment. We provide the rationale for as well as the practical concerns with such an approach for estimating marginal effects. Using schematics, we illustrate our procedure: targeted machine learning estimation (TMLE) with Adaptive Pre-specification. Briefly, sample-splitting is used to data-adaptively select the combination of estimators of the outcome regression (i.e., the conditional expectation of the outcome given the trial arm and covariates) and known propensity score (i.e., the conditional probability of being randomized to the intervention given the covariates) that minimizes the cross-validated variance estimate and, thereby, maximizes empirical efficiency. We discuss our approach for evaluating finite sample performance with parametric and plasmode simulations, pre-specifying the Statistical Analysis Plan, and unblinding in real-time on video conference with our colleagues from around the world. We present the results from applying our approach in the primary, pre-specified analysis of 8 recently published trials (2022-2024). We conclude with practical recommendations and an invitation to implement our approach in the primary analysis of your next trial.
PLoS Medicine · 2025-01-24 · 6 citations
articleOpen accessCorrespondingBACKGROUND: Cardiovascular disease (CVD) morbidity and mortality is increasing in Africa, largely due to undiagnosed and untreated hypertension. Approaches that leverage existing primary health systems could improve hypertension treatment and reduce CVD, but cost-effectiveness is unknown. We evaluated the cost-effectiveness of population-level hypertension screening and implementation of chronic care clinics across eastern, southern, central, and western Africa. METHODS AND FINDINGS: We conducted a modeling study to simulate hypertension and CVD across 3,000 scenarios representing a range of settings across eastern, southern, central, and western Africa. We evaluated 2 policies compared to current hypertension treatment: (1) expansion of HIV primary care clinics into chronic care clinics that provide hypertension treatment for all persons regardless of HIV status (chronic care clinic or CCC policy); and (2) CCC plus population-level hypertension screening of adults ≥40 years of age by community health workers (CHW policy). For our primary analysis, we used a cost-effectiveness threshold of US $500 per disability-adjusted life-year (DALY) averted, a 3% annual discount rate, and a 50-year time horizon. A strategy was considered cost-effective if it led to the lowest net DALYs, which is a measure of DALY burden that takes account of the DALY implications of the cost for a given cost-effectiveness threshold. Among adults 45 to 64 years, CCC implementation would improve population-level hypertension control (the proportion of people with hypertension whose blood pressure is controlled) from mean 4% (90% range 1% to 7%) to 14% (6% to 26%); additional CHW screening would improve control to 44% (35% to 54%). Among all adults, CCC implementation would reduce ischemic heart disease (IHD) incidence by 10% (3% to 17%), strokes by 13% (5% to 23%), and CVD mortality by 9% (3% to 15%). CCC plus CHW screening would reduce IHD by 28% (19% to 36%), strokes by 36% (25% to 47%), and CVD mortality by 25% (17% to 34%). CHW screening was cost-effective in 62% of scenarios, CCC in 31%, and neither policy was cost-effective in 7% of scenarios. Pooling across setting-scenarios, incremental cost-effectiveness ratios were $69/DALY averted for CCC and $389/DALY averted adding CHW screening to CCC. CONCLUSIONS: Leveraging existing healthcare infrastructure to implement population-level hypertension screening by CHWs and hypertension treatment through integrated chronic care clinics is expected to reduce CVD morbidity and mortality and is likely to be cost-effective in most settings across Africa.
PLOS Global Public Health · 2025-05-12 · 1 citations
articleOpen accessGaps in HIV RNA monitoring persist globally impeding the ability to determine clinical progress and outcomes. This study systematically evaluated provider (e.g., guideline non-adherence), system (e.g., laboratory error) and participant-level (e.g., refusal) drivers of missed viral load (VL) monitoring measurements among people with HIV in Kenya. Adults aged 18-65 years were followed across five health facilities in Kenya as part of a clinical trial (NCT#02338739) where HIV RNA monitoring was done routinely. Instances of missed VL despite being indicated per Kenyan guidelines were identified. An algorithm for assessing root causes of missing HIV RNA was developed and generalized linear models estimated the risk ratios (RR) for participant-level characteristics associated with missed viral load. Among 1,754 participants (66% female), the prevalence of missed viral load in year one and two was 24.4% and 29.4%, respectively. Drivers for missed viral load measurements included loss to follow up (51.5% in year one and 57.8% in year two), clinician non-adherence with guidelines (36.7% in year one and 32.2% in year two), unknown (10.3% in year one and 8.6% in year two), and requested but not collected (1.5% in year one and 1.3% in year two). Participants aged < 24 years (RR 2.27, 95% CI: 1.66-3.12), those with higher socioeconomic status (RR 1.47, 95% CI: 1.03-1.91), receiving HIV treatment at a rural clinic (RR 1.22, 95% CI: 1.02-1.46) and with advanced HIV disease (RR 2.39, 95% CI: 1.52-3.73) were more likely to miss VL monitoring. Missed routine viral load monitoring remains high, primarily due to loss to follow-up, and may substantially alter suppression estimates. Sustainable approaches to keep people with HIV engaged in care, alongside strengthening providers' clinical practices and alignment with national guidelines, are necessary for optimizing viral monitoring and accurately assessing viral suppression within public health systems.
Journal of Causal Inference · 2025-01-01
articleOpen accessSenior authorAbstract The Causal Roadmap is a formal framework for causal and statistical inference that supports clear specification of the causal question, interpretable and transparent statement of required causal assumptions, robust inference, and optimal precision. The Roadmap is thus particularly well suited to evaluating longitudinal causal effects using large-scale registries; however, application of the Roadmap to registry data also introduces particular challenges. In this article, we provide a detailed case study of the longitudinal Causal Roadmap applied to the Danish National Registry to evaluate the comparative effectiveness of second-line diabetes drugs on dementia risk. Specifically, we evaluate the difference in counterfactual 5-year cumulative risk of dementia if a target population of adults with type 2 diabetes had initiated and remained on glucagon-like peptide-1 receptor agonists (GLP1-RA) (a second-line diabetes drug) compared to a range of active comparator protocols. Time-dependent confounding is accounted for through use of the iterated conditional expectation representation of the longitudinal g-formula as a statistical estimand. Statistical estimation uses longitudinal targeted maximum likelihood, incorporating machine learning. We provide practical guidance on the implementation of the Roadmap using registry data and highlight how rare exposures and outcomes over long-term follow-up can raise challenges for flexible and robust estimators, even in the context of the large sample sizes provided by the registry. We demonstrate how simulations can be used to help address these challenges by supporting careful estimator pre-specification. We find a protective effect of GLP-1RAs compared to some but not all other second-line treatments.
Journal of the International AIDS Society · 2025-07-01 · 2 citations
articleOpen accessINTRODUCTION: Injectable cabotegravir (CAB-LA) is highly effective for HIV prevention, but real-world implementation studies in Africa are ongoing. We assessed feasibility and acceptability among participants who used CAB-LA in the SEARCH Dynamic Choice HIV Prevention extension study in rural Uganda and Kenya. METHODS: From January 2023 to December 2024, we followed females and males who were aged ≥ 15 years, with self-assessed risk for HIV acquisition, in the intervention arm of the SEARCH Dynamic Choice HIV Prevention extension study, and received at least one CAB-LA injection during the first 48 weeks. To assess the feasibility and acceptability of CAB-LA, we designed quantitative surveys based on the Theoretical Framework for Acceptability. Surveys were administered at CAB-LA initiation, after 24 and 48 weeks of use, and discontinuation of CAB-LA. RESULTS: Of 487 intervention arm participants, 274 (56%) started CAB-LA (183 females; 91 males; 79 youth aged 15-24 years). Of whom, 264 completed the survey at initiation, 206 after 24 weeks on CAB-LA, 201 after 48 weeks on CAB-LA and 69 at discontinuation of CAB-LA. Most participants (65%; 171/264) reported choosing CAB-LA because it was easier to take than pills, and nearly all (99%; 261/264) had limited knowledge of CAB-LA prior to the study. Concerns for side effects were the largest anticipated and experienced barrier to CAB-LA. Overall and with subgroups, satisfaction with CAB-LA was high at 24 weeks (97%; 200/206) and 48 weeks (96%; 193/201). Nearly all participants reported that taking CAB-LA was easy at 24 weeks (95%; 195/206) and 48 weeks (99%; 198/201). At CAB-LA discontinuation, 83% (57/69) were likely to extremely likely to recommend CAB-LA to a friend: 80% (20/25) of males, 84% (37/44) of females, 100% (19/19) of youth and 76% (38/50) of older adults. CONCLUSIONS: In rural Uganda and Kenya, over half of participants in the SEARCH trial who were offered choice of oral PrEP/PEP or CAB-LA chose and started CAB-LA during the first 48 weeks. For both males and females and younger and older adults, CAB-LA was both feasible and acceptable to deliver with satisfaction remaining high throughout the study, and nearly all reporting ease of use. CLINICAL TRIAL NUMBER: 05549726.
UNC Libraries · 2025-06-03
articleOpen accessSenior authorGaps in HIV RNA monitoring persist globally impeding the ability to determine clinical progress and outcomes. This study systematically evaluated provider (e.g., guideline non-adherence), system (e.g., laboratory error) and participant-level (e.g., refusal) drivers of missed viral load (VL) monitoring measurements among people with HIV in Kenya. Adults aged 18-65 years were followed across five health facilities in Kenya as part of a clinical trial (NCT#02338739) where HIV RNA monitoring was done routinely. Instances of missed VL despite being indicated per Kenyan guidelines were identified. An algorithm for assessing root causes of missing HIV RNA was developed and generalized linear models estimated the risk ratios (RR) for participant-level characteristics associated with missed viral load. Among 1,754 participants (66% female), the prevalence of missed viral load in year one and two was 24.4% and 29.4%, respectively. Drivers for missed viral load measurements included loss to follow up (51.5% in year one and 57.8% in year two), clinician non-adherence with guidelines (36.7% in year one and 32.2% in year two), unknown (10.3% in year one and 8.6% in year two), and requested but not collected (1.5% in year one and 1.3% in year two). Participants aged < 24 years (RR 2.27, 95% CI: 1.66-3.12), those with higher socioeconomic status (RR 1.47, 95% CI: 1.03-1.91), receiving HIV treatment at a rural clinic (RR 1.22, 95% CI: 1.02-1.46) and with advanced HIV disease (RR 2.39, 95% CI: 1.52-3.73) were more likely to miss VL monitoring. Missed routine viral load monitoring remains high, primarily due to loss to follow-up, and may substantially alter suppression estimates. Sustainable approaches to keep people with HIV engaged in care, alongside strengthening providers' clinical practices and alignment with national guidelines, are necessary for optimizing viral monitoring and accurately assessing viral suppression within public health systems.
Recent grants
Targeted Learning using adaptive designs for HIV Epidemic control in East Africa
NIH · $5.1M · 2007–2025
A Multisectoral Strategy to Address Persistent Drivers of the HIV Epidemic in East Africa
NIH · $23.4M · 2020–2027
Targeted Learning using adaptive designs for HIV Epidemic control in East Africa
NIH · $613k · 2007–2024
Adaptive Strategies for Preventing & Treating Lapses of Retention in Care (AdaPT)
NIH · $3.4M · 2014–2019
Frequent coauthors
- 216 shared
Moses R. Kamya
Makerere University
- 203 shared
Diane V. Havlir
University of California, San Francisco
- 188 shared
Edwin D. Charlebois
University of California, San Francisco
- 163 shared
Gabriel Chamie
University of California, San Francisco
- 152 shared
Tamara D. Clark
Nationwide Children's Hospital
- 140 shared
Dalsone Kwarisiima
Infectious Diseases Research Collaboration
- 118 shared
Elizabeth A. Bukusi
Kenya Medical Research Institute
- 113 shared
James Ayieko
Kenya Medical Research Institute
Labs
Computational Precision HealthPI
PhD and Designated Emphasis students in UCSF & UC Berkeley Computational Precision Health 2023-2024 PhD Cohort
Awards & honors
- Howard Hughes Medical Institute Pre-doctoral award
- Doris Duke Clinical Scientist Development award
- American Statistical Association national teaching award
- June 18 'Maya Petersen Day' in San Francisco
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Maya Petersen
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup