Yates Coley
· Affiliate Associate ProfessorUniversity of Washington · Biostatistics
Active 2011–2025
About
Yates Coley is an Affiliate Associate Professor in the Department of Biostatistics at the University of Washington. He earned his PhD in Biostatistics from the University of Washington in 2014 and his MS in Biostatistics from the same institution in 2010. His work focuses on building equitable prediction models, with a particular interest in health care applications such as dementia risk screening and racial inequity in suicide prediction models. Coley's research aims to develop and improve statistical methods for health research, emphasizing fairness and equity in predictive modeling. His contributions have been recognized in various health care contexts, including studies on mammogram benefits and health disparities, and he has been featured in media outlets such as the Puget Sound Business Journal and UW Medicine Newsroom.
Selected publications
Behavior of prediction performance metrics with rare events
ArXiv.org · 2025-04-22
preprintOpen accessObjective: Area under the receiving operator characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes. Recent articles have raised concerns that AUC might be a misleading measure of prediction performance in the rare event setting. This setting is common since many events of clinical importance are rare. We aimed to determine whether the bias and variance of AUC are driven by the number of events or the event rate. We also investigated the behavior of other commonly used measures of prediction performance, including positive predictive value, accuracy, sensitivity, and specificity. Study Design and Setting: We conducted a simulation study to determine when or whether AUC is unstable in the rare event setting by varying the size of datasets used to train and evaluate prediction models. This plasmode simulation study was based on data from the Mental Health Research Network; the data contained 149 predictors and the outcome of interest, suicide attempt, which had event rate 0.92\% in the original dataset. Results: Our results indicate that poor AUC behavior -- as measured by empirical bias, variability of cross-validated AUC estimates, and empirical coverage of confidence intervals -- is driven by the number of events in a rare-event setting, not event rate. Performance of sensitivity is driven by the number of events, while that of specificity is driven by the number of non-events. Other measures, including positive predictive value and accuracy, depend on the event rate even in large samples. Conclusion: AUC is reliable in the rare event setting provided that the total number of events is moderately large; in our simulations, we observed near zero bias with 1000 events.
Health Services Research · 2025-07-21
articleOpen accessSenior authorOBJECTIVE: To compare alternative Difference-in-Differences (DID) methods for evaluating the effect of risk-stratified interventions, or interventions targeting at-risk groups, on binary outcomes. STUDY SETTING AND DESIGN: In simulations, we compared operating characteristics of recycled prediction estimators for common average treatment effect on the treated (ATT) estimands across three DID models: the traditional two groups and two periods model, a risk score adjusted model, and a model adjusting for risk score and its interactions with risk group and period. We compared DID ATT estimates to randomized evaluation estimates of a risk-stratified intervention implemented at Kaiser Permanente Washington (KPWA), delivering additional text-message reminders to reduce missed clinic visits. DATA SOURCES AND ANALYTIC SAMPLE: Our study included 588,503 KPWA visits, with 285,814 (49%) visits pre-evaluation (05/01/2018-10/30/2018) and 302,689 (51%) visits during the evaluation (02/01/2019-09/30/2019). Pre-evaluation, 120,350 visits were classified as high-risk. During the evaluation, 125,076 visits were labeled as high-risk, with 62,557 (50%) randomized to the intervention. We generated data in simulations based on this setting. PRINCIPAL FINDINGS: In simulations, the traditional DID and risk score adjusted models had smaller bias and standard errors, and better coverage probabilities. DID estimates closest to randomized evaluation estimates (-0.007, 95% CI [-0.010, -0.004]) were from the traditional DID model assuming the identity link (-0.008, 95% CI [-0.011, -0.005]) or the risk adjusted model with any link (-0.006, 95% CI [-0.008, -0.003] identity; -0.007, 95% CI [-0.011, -0.003] logit; -0.007, 95% CI [-0.012, -0.003] log) for the ATT on the absolute difference scale (usual DID ATT estimand), and the risk score adjusted model with log or logit links for all other estimands. CONCLUSIONS: Compared with randomized evaluation results, the traditional DID model is appropriate for the ATT on the absolute difference scale, while the risk score adjusted model with log or logit links is appropriate for all ATT estimands considered.
Prevention Science · 2025-10-20
articleOpen access1st authorCorrespondingHospitalization and death following COVID-19 infection continue to pose a major public health concern and place strain on health system resources. Outpatient antiviral medication can reduce the risk of COVID-19 hospitalization and death for those at risk of poor outcomes, but identifying high-risk populations who may benefit most from treatment is challenging. The objective of this study was to develop and validate a prediction model for the composite outcome of hospitalization or death in the 14 days following COVID-19 infection. Our sample included 67,530 COVID-19 infections documented in outpatient care and occurring between April 1, 2020, and November 1, 2022, for 64,529 Kaiser Permanente Washington patients who did not receive outpatient antiviral treatment; 1378 (2.0%) of these infections resulted in hospitalization or death. Our prediction model, estimated using logistic regression with LASSO variable selection and ridge penalization, included 19 risk factors and showed high performance, including an area under the curve of 0.825 (95% confidence interval 0.813-0.836). Among the 10% of infections with the highest risk predictions, the true positive rate was 48% (46-51%) and the positive predictive value was 9.9% (9.2-10.6%). Supplemental analyses confirmed strong model performance across racial and ethnic subgroups and over time. We also present our process for selecting a risk threshold above which to recommend antiviral treatment and discuss considerations for prospective clinical implementation. This project demonstrates that machine learning tools can be used by health systems to deliver timely, targeted secondary prevention to reduce the risk of serious illness or death.
Behavior of prediction performance metrics with rare events
Journal of Clinical Epidemiology · 2025-11-11
articleJournal of Adolescent Health · 2024-11-22 · 4 citations
articleOpen accessTESTING ERADAR: A NEW EHR-BASED ALGORITHM TO INCREASE DEMENTIA RECOGNITION IN HEALTH CARE SYSTEMS
Innovation in Aging · 2024-12-01
articleOpen accessAbstract Electronic health records (EHRs) hold potential for identifying individuals with undiagnosed dementia. Using machine learning, our team developed and externally validated a predictive algorithm called the EHR Risk of Alzheimers and Dementia Assessment Rule (eRADAR) that estimates the likelihood an individual has undiagnosed dementia with high accuracy (C statistics of 0.79 to 0.84). Through two embedded pragmatic trials, we are implementing eRADAR in 11 primary care clinics within Kaiser Permanente Washington and University of California, San Francisco. The target population is older adults age 65+ without a documented dementia diagnosis or medication. Primary care providers (PCPs) are randomized to intervention or usual care. The eRADAR algorithm is used to identify individuals with eRADAR scores in the top 15-20%, who are invited to a “brain health” visit that includes assessment of instrumental activities of daily living, depressive symptoms, and cognitive function (Montreal Cognitive Assessment). Results are shared with the patient and PCP, and the PCP is responsible for making the final diagnosis (with decision support provided through the EHR). To date, we have implemented the intervention in 9 clinics and conducted over 590 brain health visits. About 31% (658/2137) of high-risk individuals accept a brain health visit. Of these, about 18% have results suggesting dementia and another 30% mild cognitive impairment. Post-visit surveys show high acceptance of and satisfaction with the intervention. The primary study outcome is rate of dementia diagnosis over 12-month follow-up (completed by April 2025). We are also conducting semi-structured interviews to illuminate benefits and harms.
Innovation in Aging · 2024-12-01
articleOpen accessAbstract Nearly 7 million people in the U.S. are living with dementia, but approximately half are undiagnosed. We are performing a randomized, controlled trial in two healthcare systems (Kaiser Permanent Washington-KPWA, University of California San Francisco-UCSF) of targeted dementia screening. Among primary care patients ≥ age 65, high-risk individuals are identified using a predictive algorithm, eRADAR (electronic health record Risk of Alzheimer’s and Dementia Assessment Rule), and invited for a “brain health” screening visit (BHV). Among 1709 KPWA patients in the intervention group, 506 (30%) completed a BHV; ongoing UCSF recruitment is similar (N=68/221; 31%). Visit completion rates were lower for participants of color (Asian/Black/Latinx) at 18% (N=51/336) compared with non-Hispanic whites at 34% (N=446/1,313), adjusted OR 0.36, 95% CI 0.26-0.49. Completion rates were 33% (N=30/90) among people aged 65-74 vs. 16% (N=44/268) for those aged 90+, adjusted OR per decade 0.71, 95% CI 0.56-0.90. Prior diagnoses of memory problems/concerns or mild cognitive impairment (MCI) were associated with higher odds of completion (adjusted OR 1.49; CI 1.13-1.95), and so was lower comorbidity score (adjusted OR 1.01; CI 1.00-1.03). eRADAR score and gender were not associated with BHV completion. The eRADAR study participation rate is similar to other dementia screening studies. Younger, white patients and those who might be aware of their cognitive deficits–as demonstrated by prior diagnoses–appear more likely to participate. As disease-modifying treatments increasingly become available, further research with communities of color to understand facilitators and barriers to engagement in primary-care dementia assessment is vital to prevent care disparities.
Importance of variables from different time frames for predicting self-harm using health system data
Journal of Biomedical Informatics · 2024-11-16 · 1 citations
articleOpen accessSenior authorSafer and targeted use of antipsychotics in youth: an embedded, pragmatic randomized trial
Journal of Child Psychology and Psychiatry · 2024-10-29 · 2 citations
articleOpen accessBACKGROUND: Antipsychotic medications (AP) are inappropriately prescribed to young people. The goal of this pragmatic trial was to test a four-component approach to improved targeting of antipsychotic prescribing to people aged ≥3 and <18 years. METHODS: Clinicians in four health systems were cluster randomized by the number of previous AP orders and service line - specialty mental health and all others. Intervention arm clinicians received a best practice alert and child psychiatrist consultation and feedback. Families received system navigation and expedited access to psychotherapy. Primary outcomes were total days' supply of AP medication and proportion of youth with any AP supply at 6 months. We estimated the log-odds of AP use at 6 months and the relative rate of AP over 6 months. The Safer and Targeted Use of Antipsychotics in Youth (SUAY) trial took place between 3/2018 and 12/2020. RESULTS: The trial enrolled 733 patients. The odds ratio (OR) comparing use at 6 months was 0.75 (95% CI: 0.52, 1.09). The mean number of days using AP was 118.5 for intervention patients and 128.2 for control patients (relative risk [RR] = 0.92; 95% CI: 0.81-1.04). Exploratory heterogeneity of treatment effects (HTE) was not detected in groups defined by age, gender, provider specialty, and insurance type. HTE by race/ethnicity was present: among youth of color, mean days' supply was 103.2 for intervention arm and 131.2 for the control arm (RR 0.79, 95% CI: 0.67-0.93). Among secondary outcomes, only new psychotherapy referrals differed with 44.3% (n = 154) of intervention participants having a new order for psychotherapy compared to 33.5% (n = 129) in the control arm (OR 1.47: 95% CI: 1.01-2.14). CONCLUSIONS: This intervention did not result in less AP use at 6 months or a reduction in the days' supply of AP medication, although psychotherapy orders increased. The intervention may be effective for some subgroups.
Prevention Science · 2024-01-15
erratumOpen access
Awards & honors
- Featured Health Care Heroes: Yates Coley (2024)
- Kaiser Permanente Washington Health Research Institute (2023…
- Dementia risk screening tool shows promise in health care se…
- Study examines racial inequity in suicide prediction models…
- Study IDs women who benefit less from 3D mammograms (2020)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Yates Coley
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup