Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Nigam Shah

Nigam Shah

· Associate Professor of Medicine and of Biomedical Data Science

Stanford University · Biomedical Data Science

Active 2012–2024

h-index2
Citations27
Papers153 last 5y
Funding
See your match with Nigam Shah — sign in to PhdFit.Sign in

About

Nigam Shah is a faculty member involved in AI for Health at Stanford University. His research interests include precision healthcare, healthcare delivery, real-world data, causality, and value-based care. He is associated with Stanford Engineering and is part of the AI for Health team, contributing to advancements in applying artificial intelligence to improve health outcomes and healthcare systems.

Research topics

  • Computer Science
  • Natural Language Processing
  • Data science
  • Database
  • Multimedia
  • Marketing
  • Philosophy
  • Reliability engineering
  • Linguistics
  • Chemistry
  • Medicine
  • Engineering
  • Business

Selected publications

  • Red Teaming Large Language Models in Medicine: Real-World Insights on Model Behavior

    medRxiv (Cold Spring Harbor Laboratory) · 2024 · 9 citations

    • Computer Science
    • Artificial Intelligence
    • Political Science

    0. Abstract Background The integration of large language models (LLMs) in healthcare offers immense opportunity to streamline healthcare tasks, but also carries risks such as response accuracy and bias perpetration. To address this, we conducted a red-teaming exercise to assess LLMs in healthcare and developed a dataset of clinically relevant scenarios for future teams to use. Methods We convened 80 multi-disciplinary experts to evaluate the performance of popular LLMs across multiple medical scenarios. Teams composed of clinicians, medical and engineering students, and technical professionals stress-tested LLMs with real world clinical use cases. Teams were given a framework comprising four categories to analyze for inappropriate responses: Safety, Privacy, Hallucinations, and Bias. Prompts were tested on GPT-3.5, GPT-4.0, and GPT-4.0 with the Internet. Six medically trained reviewers subsequently reanalyzed the prompt-response pairs, with dual reviewers for each prompt and a third to resolve discrepancies. This process allowed for the accurate identification and categorization of inappropriate or inaccurate content within the responses. Results There were a total of 382 unique prompts, with 1146 total responses across three iterations of ChatGPT (GPT-3.5, GPT-4.0, GPT-4.0 with Internet). 19.8% of the responses were labeled as inappropriate, with GPT-3.5 accounting for the highest percentage at 25.7% while GPT-4.0 and GPT-4.0 with internet performing comparably at 16.2% and 17.5% respectively. Interestingly, 11.8% of responses were deemed appropriate with GPT-3.5 but inappropriate in updated models, highlighting the ongoing need to evaluate evolving LLMs. Conclusion The red-teaming exercise underscored the benefits of interdisciplinary efforts, as this collaborative model fosters a deeper understanding of the potential limitations of LLMs in healthcare and sets a precedent for future red teaming events in the field. Additionally, we present all prompts and outputs as a benchmark for future LLM model evaluations. 1-2 Sentence Description As a proof-of-concept, we convened an interactive “red teaming” workshop in which medical and technical professionals stress-tested popular large language models (LLMs) through publicly available user interfaces on clinically relevant scenarios. Results demonstrate a significant proportion of inappropriate responses across GPT-3.5, GPT-4.0, and GPT-4.0 with Internet (25.7%, 16.2%, and 17.5%, respectively) and illustrate the valuable role that non-technical clinicians can play in evaluating models.

  • To do no harm — and the most good — with AI in health care

    Nature Medicine · 2024 · 69 citations

    • Political Science
    • Medicine
    • Nursing
  • FactEHR: A Dataset for Evaluating Factuality in Clinical Notes Using LLMs

    arXiv (Cornell University) · 2024 · 1 citations

    Senior authorCorresponding
    • Computer Science
    • Natural Language Processing
    • Computer Science

    Verifying and attributing factual claims is essential for the safe and effective use of large language models (LLMs) in healthcare. A core component of factuality evaluation is fact decomposition, the process of breaking down complex clinical statements into fine-grained atomic facts for verification. Recent work has proposed fact decomposition, which uses LLMs to rewrite source text into concise sentences conveying a single piece of information, to facilitate fine-grained fact verification. However, clinical documentation poses unique challenges for fact decomposition due to dense terminology and diverse note types and remains understudied. To address this gap and explore these challenges, we present FactEHR, an NLI dataset consisting of document fact decompositions for 2,168 clinical notes spanning four types from three hospital systems, resulting in 987,266 entailment pairs. We assess the generated facts on different axes, from entailment evaluation of LLMs to a qualitative analysis. Our evaluation, including review by the clinicians, reveals substantial variability in LLM performance for fact decomposition. For example, Gemini-1.5-Flash consistently generates relevant and accurate facts, while Llama-3 8B produces fewer and less consistent outputs. The results underscore the need for better LLM capabilities to support factual verification in clinical text.

  • Use of Machine Learning and Lay Care Coaches to Increase Advance Care Planning Conversations for Patients With Metastatic Cancer

    JCO Oncology Practice · 2022 · 31 citations

    • Medicine
    • Family medicine
    • Nursing

    PURPOSE: Patients with metastatic cancer benefit from advance care planning (ACP) conversations. We aimed to improve ACP using a computer model to select high-risk patients, with shorter predicted survival, for conversations with providers and lay care coaches. Outcomes included ACP documentation frequency and end-of-life quality measures. METHODS: In this study of a quality improvement initiative, providers in four medical oncology clinics received Serious Illness Care Program training. Two clinics (thoracic/genitourinary) participated in an intervention, and two (cutaneous/sarcoma) served as controls. ACP conversations were documented in a centralized form in the electronic medical record. In the intervention, providers and care coaches received weekly e-mails highlighting upcoming clinic patients with < 2 year computer-predicted survival and no prior prognosis documentation. Care coaches contacted these patients for an ACP conversation (excluding prognosis). Providers were asked to discuss and document prognosis. RESULTS: = .04). CONCLUSION: Combining a computer prognosis model with care coaches increased ACP documentation.

  • An open repository of real-time COVID-19 indicators

    Proceedings of the National Academy of Sciences · 2021 · 71 citations

    • Computer Science
    • Internet privacy
    • Data science

    The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.

  • Summarizing patients like mine via an on-demand consultation service

    Proceedings of the VLDB Endowment · 2021

    1st authorCorresponding
    • Computer Science
    • Computer Science
    • Data science

    Using evidence derived from previously collected medical records to guide patient care has been a long-standing vision of clinicians and informaticians, and one with the potential to transform medical practice. We offered an on-demand consultation service to derive evidence from millions of other patients' data to answer clinician questions and support their bedside decision making. We describe the design and implementation of the service as well as a summary of our experience in responding to the first 100 requests. We will also review a new paradigm for a scalable time-aware clinical data search, and to describe the design, implementation, and use of a search engine realizing this paradigm.

  • Treatment and Monitoring Variability in US Metastatic Breast Cancer Care

    JCO Clinical Cancer Informatics · 2021 · 19 citations

    • Medicine
    • Oncology
    • Internal medicine

    PURPOSE: Treatment and monitoring options for patients with metastatic breast cancer (MBC) are increasing, but little is known about variability in care. We sought to improve understanding of MBC care and its correlates by analyzing real-world claims data using a search engine with a novel query language to enable temporal electronic phenotyping. METHODS: Using the Advanced Cohort Engine, we identified 6,180 women who met criteria for having estrogen receptor-positive, human epidermal growth factor receptor 2-negative MBC from IBM MarketScan US insurance claims (2007-2014). We characterized treatment, monitoring, and hospice usage, along with clinical and nonclinical factors affecting care. RESULTS: < .0001). CONCLUSION: Variability in US MBC care is explained by patient and disease factors and by nonclinical factors such as geographic region, suggesting that treatment decisions are influenced by local practice patterns and/or resources. A search engine designed to express complex electronic phenotypes from longitudinal patient records enables the identification of variability in patient care, helping to define disparities and areas for improvement.

  • Rates of Co-infection Between SARS-CoV-2 and Other Respiratory Pathogens

    JAMA · 2020 · 819 citations

    • Medicine
    • Virology
    • Intensive care medicine

    This study describes the prevalence of SARS-CoV-2 co-infection with noncoronavirus respiratory pathogens in a sample of symptomatic patients undergoing PCR testing in March 2020.

  • A predictive tool for identification of SARS-CoV-2 PCR-negative emergency department patients using routine test results

    Journal of Clinical Virology · 2020 · 62 citations

    • Medicine
    • Emergency medicine
    • Internal medicine
  • Data Quality Assessment of Laboratory Data

    AMIA · 2020

    Senior authorCorresponding
    • Computer Science
    • Computer Science
    • Reliability engineering

Frequent coauthors

Labs

Similar researchers at Stanford University

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Nigam Shah

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup