Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
William W Cohen

William W Cohen

· Professor

Carnegie Mellon University · Machine Learning Department

Active 1926–2026

h-index84
Citations41.2k
Papers50395 last 5y
Funding$3.3M
See your match with William W Cohen — sign in to PhdFit.Sign in

About

William W. Cohen is a Professor at Carnegie Mellon University in the Machine Learning Department, with a joint appointment in the Language Technology Institute. He holds a 20%-time appointment as a Principal Scientist at Google, where he worked full-time between May 2018 and March 2024. Cohen received his bachelor's degree in Computer Science from Duke University in 1984 and his PhD in Computer Science from Rutgers University in 1990. His professional background includes work at AT&T Bell Labs and AT&T Labs-Research from 1990 to 2000, and at Whizbang Labs from 2000 to 2002, focusing on extracting information from the web. From 2002 to 2018, he was part of Carnegie Mellon University’s Machine Learning Department, contributing significantly to the field. Cohen has served as a past president of the International Machine Learning Society and has held roles as an action editor for various prominent journals and book series related to AI and machine learning. He has been involved in organizing major conferences, including serving as General Chair for the 2008 International Machine Learning Conference and co-chairing other significant events. Recognized as an AAAI Fellow, Cohen has received multiple awards for influential papers, including the SIGMOD 'Test of Time' Award, the SIGIR 'Test of Time' Award, and the Semantic Web Science Association's Ten-Year Award. His research interests encompass question answering, machine learning for NLP tasks, neuro-symbolic reasoning, and statistical relational learning. Cohen holds seven patents related to learning, discovery, information retrieval, and data integration, and has authored more than 300 publications. His work reflects a broad engagement with both theoretical and applied aspects of machine learning and AI, contributing to advancements in understanding and developing intelligent systems.

Research topics

  • Artificial Intelligence
  • Information Retrieval
  • Computer Science
  • Machine Learning
  • Natural Language Processing
  • Data Mining
  • World Wide Web

Selected publications

  • Multiple-Prediction-Powered Inference

    arXiv (Cornell University) · 2026-03-28

    preprintOpen access

    Statistical estimation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI): a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. This work provides theoretical guarantees about the minimax optimality, finite-sample performance, and asymptotic normality of the MultiPPI estimator. Through experiments across three diverse large language model (LLM) evaluation scenarios, we show that MultiPPI consistently achieves lower estimation error than existing baselines. This advantage stems from its budget-adaptive allocation strategy, which strategically combines subsets of models by learning their complex cost and correlation structures.

  • Multiple-Prediction-Powered Inference

    arXiv (Cornell University) · 2026-03-28

    articleOpen access

    Statistical estimation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI): a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. This work provides theoretical guarantees about the minimax optimality, finite-sample performance, and asymptotic normality of the MultiPPI estimator. Through experiments across three diverse large language model (LLM) evaluation scenarios, we show that MultiPPI consistently achieves lower estimation error than existing baselines. This advantage stems from its budget-adaptive allocation strategy, which strategically combines subsets of models by learning their complex cost and correlation structures.

  • Semi-structured LLM Reasoners Can Be Rigorously Audited

    ArXiv.org · 2025-05-30

    preprintOpen accessSenior author

    Although Large Language Models (LLMs) have become capable reasoners, the problem of faithfulness persists: their reasoning can contain errors and omissions that are difficult to detect and that may obscure biases in model outputs. To address this issue, we introduce Semi-Structured Reasoning Models (SSRMs), which are trained to produce semi-structured representations of reasoning. SSRMs generate reasoning traces in a non-executable Pythonic syntax that names each reasoning step and marks its inputs and outputs. This structure allows SSRM traces to be automatically audited to identify reasoning flaws. We evaluate three types of audits: hand-crafted structured reasoning audits, written in a domain-specific language (DSL) implemented in Python; LLM-generated structured reasoning audits; and learned typicality audits, which apply probabilistic models over reasoning traces. We show that all of these methods can be used to effectively flag probable reasoning errors. Importantly, the auditability of SSRMs does not appear to compromise overall accuracy: in evaluation on twelve benchmarks and two model families, SSRMs demonstrate strong performance and generalizability relative to other models of comparable size.

  • Abstract TH146: Gaps in Hypertension Management within Populations Experiencing Food Insecurity

    Hypertension · 2025-09-01

    article

    Background: Hypertension (HTN) is the most prevalent risk factor associated with cardiovascular mortality. Food insecurity is a social determinant of health that plays a key role in determining one’s risk for HTN and CVD. Individuals experiencing food insecurity are more likely to face difficulty accessing affordable, healthy food options and healthcare. This study aims to assess the prevalence of uncontrolled hypertension within populations experiencing food insecurity in Chicago, IL. Methods: The Cardiometabolic Health Initiative (CHI) is a student-led organization that seeks to increase access to cardiometabolic screening within food insecure communities. CHI offers point-of-care cardiovascular screenings and health coaching at food pantries in Chicago, IL. Data collected include self-reported medical history and vitals. Patients blood pressures (BP) were measured and categorized as normal (systolic blood pressure (SBP)<120 and diastolic blood pressure (DBP)<80), elevated (SBP 120-129 and DBP< 80), Stage 1 (SBP 130-139 or DBP 80-89), Stage 2 (SBP >140 or DBP>90), or Hypertensive Crisis (SBP >180 or DBP>120). Results: BPs were recorded for 408 patients, of which 89 (21.81%) were categorized as normal, 31 (7.60%) as elevated, 105 (25.74%) as stage 1 HTN, 173 (42.40%) as stage 2 HTN, and 10 (2.45%) as in hypertensive crisis. Among the 408 patients, 182 (44.61%) patients self-reported not taking medication for HTN while 226 (55.39%) self-reported taking medication for HTN. Within the group of patients who take HTN medication, 19 (61.29%) had an elevated BP, 60 (57.14%) were in the stage 1 HTN range, 71 (41.04%) were in the stage 2 HTN range, and 4 (40.00%) were in hypertensive crisis. Furthermore, of the patients who do not take HTN medication, 12 (38.71%) had elevated BPs, 45 (42.86%) had BPs in the stage 1 HTN category, 102 (58.96%) in the stage 2 HTN category, and 6 (60.00%) were in hypertensive crisis. Conclusions: These findings suggest a high prevalence of uncontrolled HTN among patients screened at food pantries in Chicago, IL. This underscores key gaps in HTN management among patients experiencing food insecurity. Inequities that impact access to healthcare and levels of health literacy can contribute to difficulties controlling blood pressure among patients experiencing food insecurity, highlighting the need for additional community-based programs, like CHI, to expand access to preventative care and health education within at-risk communities.

  • 1295-P: The Association between Elevated A1C and ASCVD Risk Scores in a Food Insecure Population in Chicago

    Diabetes · 2025-06-13

    article

    Introduction and Objective: Studies show diabetes is a strong predictor of cardiovascular disease (CVD). Diabetes-related complications include coronary heart disease, cerebrovascular disease, heart failure and peripheral vascular disease. Food insecurity poses key barriers to glycemic management, potentially resulting in worse CVD outcomes. The goal of this study is to examine the association between elevated A1c and ASCVD risk scores in a food insecure population in West Chicago. Methods: The Cardiometabolic Health Initiative (CHI) is a mobile screening clinic that performs comprehensive cardiometabolic health screenings at food pantries in West Chicago. Between August 2023 and December 2024, patients received point of care A1c measurements, lipid panels, and blood pressure readings. Results were used to calculate 10-year Atherosclerotic Cardiovascular Disease (ASCVD) risk scores and provide individualized health coaching, focused on confronting social determinants of health (SDoH). Results: Out of 153 patients, 82 (54%) had a normal A1c (<5.7%), 51 (33%) had a prediabetic A1c (5.7%-6.4%) and 20 (13%) had a diabetic A1c (>6.4%) (Figure 1). The average ASCVD risk score for the total population was 9.1% (SD=11.3; Figure 2). Among patients with a normal A1c, the average ASCVD risk score was 6.6% (SD=7.9), among prediabetic patients, the average ASCVD risk score was 9.8% (SD=10.8) and among diabetic patients, the average ASCVD risk score was 17.6% (SD=18.4, p=0.000; Figure 2). Conclusion: These findings suggest a high prevalence of prediabetes and diabetes within this food insecure population with 46% of screened patients having an elevated A1c. Consistent with previous literature, elevated A1c values may directly correlate with increased ASCVD risk. Community-based preventative screenings and health education programs, like CHI, can help identify these high-risk individuals, address SDoH, and combat disease burden in disadvantaged populations. Disclosure C. Richter: None. E. Belnap: None. A. McIntosh: None. I. Khosla: None. W. Cohen: None. E. Sullivan: None. R. Garcia: None. A. DeMeo: None. D. Luger: None.

  • Abstract 4366661: Low-Cost Promotional Coronary Artery Calcium Screening Identifies High-Risk Patients Missed by Traditional Referral Pathways in Urban Hospital

    Circulation · 2025-11-03

    article

    Background: Coronary artery calcium (CAC) scoring is a key tool for risk stratification in preventive cardiology. While clinician-referred patients are typically higher-risk, some institutions adopted low-cost promotional CAC screening to enable risk assessment in individuals not reached by traditional referrals. Research Question: Among a diverse, urban cohort, do comorbidities, baseline preventive therapy, and CAC scores differ between patients undergoing clinician-referred vs. promotional CAC screening? Methods: A retrospective cohort study was conducted at a large urban academic center in Chicago. Adults undergoing CAC screening from Jan 2022 to Dec 2023 via clinician referral or low-cost promotion were included. The primary exposure was referral pathway. The primary outcome was CAC burden, categorized by Agatston score (0 = none, 1–99 = mild, 100–299 = moderate, ≥300 = severe) and stratified by coronary artery territory (Left Main, LAD, LCX, RCA). Baseline comorbidities and use of preventive medications were recorded. Descriptive statistics compared baseline characteristics. CAC coronary distributions were assessed using chi-square tests. Results: 1,743 patients were screened, 932 via clinical referral and 811 through promotion. Compared to referral patients, promotion patients had lower rates of HTN (26.1% vs. 38.4%, p<0.001), dyslipidemia (15.9% vs. 24.7%, p<0.001), and CAD (6.3% vs. 11.4%, p<0.001). Preventative medication use was lower in the promotion group: any statin (41.3% vs. 51.1%, p<0.001), high-intensity statin (14.9% vs. 24.4%, p<0.001), moderate-intensity statin (33.7% vs. 39.8%, p=0.008), aspirin (29.7% vs. 38.0%, p<0.001), and ezetimibe (2.0% vs. 4.6%, p=0.002). Referral patients had higher rates of overall Left Main(17.5% vs. 10.8%, p<0.001) and LCX (25.6% vs.23.5%, p=0.044) CAC burden. LAD(42.2% vs. 46.9%, p=0.113) and RCA(27% vs. 28.6%, p=0.433) CAC burdens did not differ significantly between promotion and referral groups moderate/severe total CAC prevalence was similar between the promotion and referral cohorts(22.4% vs.25.7%, p=0.098). Discussion: Promotion patients showed a high prevalence of non-zero CAC and similar moderate/severe CAC burden as referred patients, despite fewer comorbidities and lower medication use. These findings support low-cost promotional CAC screening as a practical method for detecting subclinical atherosclerosis and may enhance early risk detection in asymptomatic patients not typically reached by clinician referral.

  • Abstract 4366982: Sex and Racial Discordance in Referrals for Coronary Artery Calcium Screening at a Large Urban Center

    Circulation · 2025-11-03

    article

    Background: Coronary artery calcium(CAC) scoring is a non-invasive tool for detecting subclinical atherosclerosis. In 2022, RUSH University Medical Center launched a low-cost CAC screening initiative to broaden access and enhance early cardiovascular risk detection. This study evaluates referral patterns in a diverse urban population, addressing gaps in prior research. Research Question: Among patients referred for CAC screening at a large urban academic center, do referral patterns differ by patient and provider sex and race/ethnicity? Methods: We conducted a retrospective analysis of Chicago-based patients referred for CAC testing during the promotion(January 2022–December 2023). Demographic data, including sex and race/ethnicity, were collected for patients and referring providers. Descriptive statistics summarized distributions, and associations between patient and provider demographics were assessed using contingency tables. The relationship between patient and provider sex was assessed via Chi-square (X2) (p < 0.05). Results: A total of 931 patients underwent CAC testing during the study period. Of these, 59.4% were female and 40.6% male. Race/ethnicity data were available for 380 patients; 41.7% identified as non-White: Black, Hispanic/Latino, Asian, and other groups. Female providers accounted for 53.3% of referrals, and male providers for 46.7%. Patient-provider sex concordance was strong: 69.1% of female patients were referred by female providers; 69.8% of male patients were referred by male providers. Female providers referred predominantly female patients(77.0%); male providers referred predominantly male patients (60.7%). A significant association was observed between patient and provider sex(X2(1, N = 931) = 136.62, p<0.0001). Patient race also varied by provider sex. Female providers referred a higher proportion of Black patients(27.7% vs. 13.5%); male providers referred more White patients(64.8% vs. 52.8%). This distribution differed significantly by provider sex(X2(4, N = 898) = 29.59, p<0.0001). Conclusion: Significant associations were found between patient and provider sex and race/ethnicity in referrals for CAC screening during a large promotional initiative. Female providers were more likely to refer female and racially diverse patients, while male providers more often referred male and White patients. Understanding these referral patterns may inform provider education and system-level strategies to promote equitable cardiovascular risk assessment.

  • CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

    ArXiv.org · 2025-03-30

    preprintOpen access

    Existing reasoning evaluation frameworks for Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) predominantly assess either text-based reasoning or vision-language understanding capabilities, with limited dynamic interplay between textual and visual constraints. To address this limitation, we introduce CrossWordBench, a benchmark designed to evaluate the reasoning capabilities of both LLMs and LVLMs through the medium of crossword puzzles -- a task requiring multimodal adherence to semantic constraints from text-based clues and intersectional constraints from visual grid structures. CrossWordBench leverages a controllable puzzle generation framework that produces puzzles in two formats (text and image), supports adjustable difficulty through prefill ratio control, and offers different evaluation strategies, ranging from direct puzzle solving to interactive modes. Our extensive evaluation of over 20 models reveals that reasoning LLMs substantially outperform non-reasoning models by effectively leveraging crossing-letter constraints. We further demonstrate that LVLMs struggle with the task, showing a strong correlation between their puzzle-solving performance and grid-parsing accuracy. Our findings highlight limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.

  • Prevalence and underdiagnosis of diabetes mellitus in a food insecure population

    Scientific Reports · 2025-04-10

    articleOpen access

    Food insecurity is a public health issue and a major risk factor for overall worse health outcomes including hypertension, diabetes, coronary heart disease, congestive heart failure, stroke, chronic kidney disease and obesity. Food-insecure patients are more likely to have both diagnosed and undiagnosed prediabetes and diabetes. This study examines the prevalence and self-awareness of diabetes and prediabetes in an at-risk, food-insecure population. The Cardiometabolic Health Initiative (CHI) is a community service organization that provides comprehensive cardiometabolic screenings at food pantries in West Chicago. Between August 2023 and December 2024, 191 patients were screened using point-of-care A1c tests. The average A1c of the population was 6.04%. Ninety-six patients had a normal A1c (< 5.7%), 66 had a prediabetic A1c (5.7-6.4) and 29 had a diabetic A1c (> 6.4). Forty-two patients self-reported a history of DM. The average A1c for the self-reported DM group was 7.58% and the average A1c for the non-reported group was 5.60%. Among the self-reported DM group, 24 patients had controlled DM (A1c < 7%) and 18 had uncontrolled DM (A1c > 7%). Among the non-reported group, 56 had a prediabetic A1c and 3 had a diabetic A1c. The presented findings suggest a high prevalence of diabetes and prediabetes within a food-insecure population in West Chicago. Further, this study suggests that many diabetic patients struggle to control their A1c levels. Our findings reflect many barriers presented to food insecure patients that can hinder diabetes diagnosis, education, and management.

  • Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

    arXiv (Cornell University) · 2024-06-06 · 2 citations

    preprintOpen accessSenior author

    Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. PPI achieves this by combining small amounts of human-labeled data with larger amounts of data labeled by a reasonably accurate -- but potentially biased -- automatic system, in a way that results in tighter confidence intervals for certain parameters of interest (e.g., the mean performance of a language model). In this paper, we propose a method called Stratified Prediction-Powered Inference (StratPPI), in which we show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies. Without making any assumptions on the underlying automatic labeling system or data distribution, we derive an algorithm for computing provably valid confidence intervals for population parameters (such as averages) that is based on stratified sampling. In particular, we show both theoretically and empirically that, with appropriate choices of stratification and sample allocation, our approach can provide substantially tighter confidence intervals than unstratified approaches. Specifically, StratPPI is expected to improve in cases where the performance of the autorater varies across different conditional distributions of the target data.

Recent grants

Frequent coauthors

  • Bhuwan Dhingra

    56 shared
  • Ruslan Salakhutdinov

    44 shared
  • Zhilin Yang

    34 shared
  • Kenneth R. Koedinger

    Carnegie Mellon University

    34 shared
  • Haitian Sun

    Nanjing University of Chinese Medicine

    32 shared
  • Kathryn Mazaitis

    23 shared
  • Einat Minkov

    22 shared
  • Noboru Matsuda

    21 shared

Education

  • B.S.

    Duke University

    1984
  • Ph.D.

    Rutgers University

    1990

Awards & honors

  • AAAI Fellow
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with William W Cohen

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup