Jonathan Nebeker
· ProfessorVerifiedUniversity of Utah · Geriatrics
Active 1990–2025
About
Jonathan Nebeker, MD, MS, is a Professor of Medicine at the University of Utah and serves as the Chief Medical Informatics Officer (CMIO) at the Veterans Health Administration (VHA) central office. His academic background includes degrees and training at Harvard University and the University of Pennsylvania. He practices geriatrics at the Salt Lake City VA Medical Center and has been the clinical and/or informatics lead of all Electronic Health Record (EHR)-related and health IT modernization programs at VA from 2014 through the present. His research focuses on adverse drug events, human interface design, and analytical systems/advanced process analysis, with his work on the characterization, epidemiology, and prevention of adverse drug events being widely cited. Much of his research explores how EHRs help or do not help prevent adverse drug events, especially translating basic science of cognitive and social psychology into medical informatics and EHR design. From 2005 to 2015, he concentrated on this translation, demonstrating through randomized controlled trials that novel user-interface designs can increase diagnostic accuracy and reduce time to diagnosis. He established the scientific computing infrastructure for the Veterans Health Administration in 2008, and his current research emphasizes machine learning and artificial intelligence for human and electronic process control to support a highly reliable, learning health system. Dr. Nebeker directs active projects on translating cognitive and social science theories into software design for clinicians, real-time surveillance of safe EHR operation, and analytics-driven information systems.
Research topics
- Data Mining
- Computer Science
- Knowledge management
- Medicine
- Nursing
- Medical emergency
Selected publications
The Semantic Clinical Artificial Intelligence (SCAI) Chatbot: Preliminary Usability Testing
Studies in health technology and informatics · 2025-05-12
articleOpen accessWe developed a doctor-facing chatbot named SCAI. We aimed to evaluate response speed and user satisfaction. Ten questions were used to test the response speed of the SCAI chatbot. The response time was long, but overall the participants were satisfied with SCAI.
Semantic Relations: Extending SNOMED CT and Solor
Applied Clinical Informatics · 2025-08-01
articleOpen accessTerminologies, such as Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) and Solor, assist with knowledge representation and management, data integration, and triggering clinical decision support (CDS) rules. Semantic relations in these terminologies provide explicit meaning in compositional expressions, which assist with many of the above-listed activities.The aims of this research are to: (1) identify semantic relations that are not fully present in SNOMED CT and Solor and (2) use these identified semantic relations with terms that are currently present in SNOMED CT and Solor to form triples.We identified relations that were not fully present in either SNOMED CT or Solor and were important for VA Knowledge Artifacts (KNARTS). These terms and the relations were formed into triples. The relations, terms, classifications, and sentences were used to implement the relations in the High Definition-Natural Language Processing (HD-NLP) program.There are a total of 38 semantic relations. These had use cases built for each and were implemented in the Solor HD-NLP server for tagging of KNARTS.These new SNOMED CT and Solor semantic relations will give clinicians the ability to add more detail and meaning to their clinical notes. This can improve our ability to trigger CDS rules, leading to improved CDS provided to clinicians during patient care.
Retrieval Augmented Generation: What Works and Lessons Learned
Studies in health technology and informatics · 2025-05-12 · 2 citations
articleOpen accessRetrieval Augmented Generation has been shown to improve the output of large language models (LLMs) by providing context to the question or scenario posed to the model. We have tried a series of experiments to understand how best to improve the performance of the native models. We present the results of each of several experiments. These can serve as lessons learned for scientists looking to improve the performance of large language models for medical question answering tasks.
IEEE Journal of Biomedical and Health Informatics · 2025-01-09 · 7 citations
articleOpen accessSenior authorIn a large hospital system, a network of hospitals relies on electronic health records (EHRs) to make informed decisions regarding their patients in various clinical domains. Consequently, the dependability of the health information technology (HIT) systems responsible for collecting EHR data is of utmost importance for patient safety. Recently, novel methods and tools aimed at identifying anomalies in EHR data to bolster the reliability of HIT systems have been introduced. However, these existing methods and tools primarily concentrate on individual hospitals, which limits our understanding of system-wide anomalous events and their potential impact on patient safety across multiple hospitals. In this article, we introduce a new approach to detecting anomalies in EHR data within a network of hospitals. This is achieved by combining advanced machine learning techniques with graph algorithms to create a tool capable of swiftly identifying and responding to deviations. Our proposed approach employs a combination of five machine learning models, harnessing the unique strengths of each model to provide a more robust detection system. The detected anomalies are then represented as graphs, allowing us to recognize patterns across the hospital network. This aids in identifying anomalies that span multiple medical facilities, potentially indicating broader system-level risks. Extensive real-world testing of our approach demonstrated its ability to offer actionable insights compared to existing methods. Additionally, its scalable design ensures seamless integration into existing HIT infrastructures.
Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE
JAMA Network Open · 2025-04-22 · 10 citations
articleOpen accessImportance: Large language models (LLMs) are being implemented in health care. Enhanced accuracy and methods to maintain accuracy over time are needed to maximize LLM benefits. Objective: To evaluate whether LLM performance on the US Medical Licensing Examination (USMLE) can be improved by including formally represented semantic clinical knowledge. Design, Setting, and Participants: This comparative effectiveness research study was conducted between June 2024 and February 2025 at the Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, New York, using sample questions from the USMLE Steps 1, 2, and 3. Intervention: Semantic clinical artificial intelligence (SCAI) was developed to insert formally represented semantic clinical knowledge into LLMs using retrieval augmented generation (RAG). Main Outcomes and Measures: The SCAI method was evaluated by comparing the performance of 3 Llama LLMs (13B, 70B, and 405B; Meta) with and without SCAI RAG on text-based questions from the USMLE Steps 1, 2, and 3. LLM accuracy for answering questions was determined by comparing the LLM output with the USMLE answer key. Results: The LLMs were tested on 87 questions in the USMLE Step 1, 103 in Step 2, and 123 in Step 3. The 13B LLM enhanced by SCAI RAG was associated with significantly improved performance on Steps 1 and 3 but only met the 60% passing threshold on Step 3 (74 questions correct [60.2%]). The 70B and 405B LLMs passed all the USMLE steps with and without SCAI RAG. The SCAI RAG 70B model scored 80 questions (92.0%) correctly on Step 1, 82 (79.6%) on Step 2, and 112 (91.1%) on Step 3. The SCAI RAG 405B model scored 79 (90.8%) correctly on Step 1, 87 (84.5%) on Step 2, and 117 (95.1%) on Step 3. Significant improvements associated with SCAI RAG were found for the 13B model on Steps 1 and 3, the 70B model on Step 2, and the 405B parameter model on Step 3. The 70B model was significantly better than the 13B model, and the 405B model was not significantly better than the 70B model. Conclusions and Relevance: In this comparative effectiveness research study, SCAI RAG was associated with significantly improved scores on the USMLE Steps 1, 2, and 3. The 13B model passed Step 3 with RAG, and the 70B and 405B models passed and scored well on Steps 1, 2, and 3 with or without augmentation. New forms of reasoning by LLMs, like semantic reasoning, have potential to improve the accuracy of LLM performance on important medical questions. Improving LLM performance in health care with targeted, up-to-date clinical knowledge is an important step in LLM implementation and acceptance.
Medical Care · 2025-04-22
articleOBJECTIVES: To demonstrate an innovative method combining machine learning with comparative effectiveness research techniques and to investigate a hitherto unstudied question about the effectiveness of common prescribing patterns. DATA SOURCES: United States Veterans Health Administration Corporate Data Warehouse. STUDY DESIGN: For Operation Enduring Freedom/Operation Iraqi Freedom veterans with major depressive disorder, we generate pharmacotherapy pathways (of antidepressants) using process mining and machine learning. We select the medication episodes that were started at subtherapeutic doses by the first assigned primary care physician and observe the paths that those medication episodes follow. Using 2-stage least squares, we test the effectiveness of starting at a low dose and staying low for longer versus ramping up fast while balancing observable and unobservable characteristics of patients and providers through instrumental variables. We leverage predetermined provider practice patterns as instruments. DATA COLLECTION: We collected outpatient pharmacy data for selective serotonin reuptake inhibitors and selective norepinephrine reuptake inhibitors, patient and provider characteristics (as control variables), and the instruments for our cohort. All data were extracted for the period between 2006 and 2020. PRINCIPAL FINDINGS: There is a statistically significant positive effect (0.68, 95% CI 0.11-1.25) of "ramping up fast" on engagement in care. When we examine the effect of "ramping up slow", we see an insignificant negative impact on engagement in care (-0.82, 95% CI -1.89 to 0.25). As expected, the probability of drop-out also seems to have a negative effect on engagement in care (-0.39, 95% CI -0.94 to 0.17). We further validate these results by testing with medication possession ratios calculated periodically as an alternative engagement in care metric. CONCLUSIONS: Our findings contradict the "Start low, go slow" adage, indicating that ramping up the dose of an antidepressant faster has a significantly positive effect on engagement in care for our population.
ENHANCING HEALTHCARE VALUE THROUGH INTEROPERABLE E-CARE PLANS: A FRAMEWORK FOR QUALITY IMPROVEMENT
Innovation in Aging · 2024-12-01 · 1 citations
articleOpen accessSenior authorAbstract Patient-centered data is critical for providing high-value care and assessing quality including functional and cognitive status, goals, and outcomes that matter most to people; these are often missing in Electronic Health Records (EHRs). Furthermore, lack of interoperability makes it challenging to share and aggregate data across clinicians and care settings, leading to fragmented and/or duplicative care. E-care plans (eCP) and data standards that focus on goal setting and outcome management are in development, and hold promise to support person- centered care planning across sites and settings of care. E-Care Plans support care coordination and quality improvement and inform future healthcare value-focused policies—such as those related to equitable coverage, quality improvement, measurement, payment, and innovation model planning. Interoperable eCPs could be integrated into clinical care and across the healthcare ecosystem. This integration will enhance care coordination and facilitate seamless information exchange. In this session, Federal officials from the CMS, VA, and AHRQ will explore the benefits of eCPs and discuss the work underway to support the implementation of these plans in clinical care. This presentation will also discuss efforts to develop value sets and FHIR mappings to enable bidirectional information exchange and the concurrent pilot testing of applications with patient data. E-care plans have the potential improve patient safety, reduce burden, support the efficient exchange of patient data such as cognitive function, physical function, and individual goals to support improving person-centered outcomes, advancing health equity, conducting economic analysis, and tailored program development.
JMIR Public Health and Surveillance · 2024-02-15 · 11 citations
articleOpen accessBACKGROUND: There have been over 772 million confirmed cases of COVID-19 worldwide. A significant portion of these infections will lead to long COVID (post-COVID-19 condition) and its attendant morbidities and costs. Numerous life-altering complications have already been associated with the development of long COVID, including chronic fatigue, brain fog, and dangerous heart rhythms. OBJECTIVE: We aim to derive an actionable long COVID case definition consisting of significantly increased signs, symptoms, and diagnoses to support pandemic-related clinical, public health, research, and policy initiatives. METHODS: This research employs a case-crossover population-based study using International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) data generated at Veterans Affairs medical centers nationwide between January 1, 2020, and August 18, 2022. In total, 367,148 individuals with ICD-10-CM data both before and after a positive COVID-19 test were selected for analysis. We compared ICD-10-CM codes assigned 1 to 7 months following each patient's positive test with those assigned up to 6 months prior. Further, 350,315 patients had novel codes assigned during this window of time. We defined signs, symptoms, and diagnoses as being associated with long COVID if they had a novel case frequency of ≥1:1000, and they significantly increased in our entire cohort after a positive test. We present odds ratios with CIs for long COVID signs, symptoms, and diagnoses, organized by ICD-10-CM functional groups and medical specialty. We used our definition to assess long COVID risk based on a patient's demographics, Elixhauser score, vaccination status, and COVID-19 disease severity. RESULTS: We developed a long COVID definition consisting of 323 ICD-10-CM diagnosis codes grouped into 143 ICD-10-CM functional groups that were significantly increased in our 367,148 patient post-COVID-19 population. We defined 17 medical-specialty long COVID subtypes such as cardiology long COVID. Patients who were COVID-19-positive developed signs, symptoms, or diagnoses included in our long COVID definition at a proportion of at least 59.7% (268,320/449,450, based on a denominator of all patients who were COVID-19-positive). The long COVID cohort was 8 years older with more comorbidities (2-year Elixhauser score 7.97 in the patients with long COVID vs 4.21 in the patients with non-long COVID). Patients who had a more severe bout of COVID-19, as judged by their minimum oxygen saturation level, were also more likely to develop long COVID. CONCLUSIONS: An actionable, data-driven definition of long COVID can help clinicians screen for and diagnose long COVID, allowing identified patients to be admitted into appropriate monitoring and treatment programs. This long COVID definition can also support public health, research, and policy initiatives. Patients with COVID-19 who are older or have low oxygen saturation levels during their bout of COVID-19, or those who have multiple comorbidities should be preferentially watched for the development of long COVID.
A framework for inferring and analyzing pharmacotherapy treatment patterns
BMC Medical Informatics and Decision Making · 2024-03-08 · 2 citations
articleOpen accessSenior authorBACKGROUND: To discover pharmacotherapy prescription patterns and their statistical associations with outcomes through a clinical pathway inference framework applied to real-world data. METHODS: We apply machine learning steps in our framework using a 2006 to 2020 cohort of veterans with major depressive disorder (MDD). Outpatient antidepressant pharmacy fills, dispensed inpatient antidepressant medications, emergency department visits, self-harm, and all-cause mortality data were extracted from the Department of Veterans Affairs Corporate Data Warehouse. RESULTS: Our MDD cohort consisted of 252,179 individuals. During the study period there were 98,417 emergency department visits, 1,016 cases of self-harm, and 1,507 deaths from all causes. The top ten prescription patterns accounted for 69.3% of the data for individuals starting antidepressants at the fluoxetine equivalent of 20-39 mg. Additionally, we found associations between outcomes and dosage change. CONCLUSIONS: For 252,179 Veterans who served in Iraq and Afghanistan with subsequent MDD noted in their electronic medical records, we documented and described the major pharmacotherapy prescription patterns implemented by Veterans Health Administration providers. Ten patterns accounted for almost 70% of the data. Associations between antidepressant usage and outcomes in observational data may be confounded. The low numbers of adverse events, especially those associated with all-cause mortality, make our calculations imprecise. Furthermore, our outcomes are also indications for both disease and treatment. Despite these limitations, we demonstrate the usefulness of our framework in providing operational insight into clinical practice, and our results underscore the need for increased monitoring during critical points of treatment.
EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records
Journal of Biomedical Informatics · 2024-02-01 · 21 citations
articleOpen accessSenior author
Recent grants
NIH · $540k · 2011
Frequent coauthors
- 91 shared
Matthew H. Samore
University of Utah
- 58 shared
Merry Ward
United States Department of Veterans Affairs
- 57 shared
Brian C. Sauer
Lake City VA Medical Center
- 49 shared
Lucy A. Savitz
UPMC Center for High Value Health Care
- 45 shared
Paul R. Yarnold
Optimal Solutions (United States)
- 45 shared
Charles L. Bennett
- 44 shared
Randall Rupper
University of Utah
- 44 shared
Wu Xu
University of Louisiana at Lafayette
Education
M.D.
University of Pennsylvania
M.S.
Harvard
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Jonathan Nebeker
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup