
Tessa Sundaram Cook
· MD PhD FSIIM FCPP FAAR FACRVerifiedUniversity of Pennsylvania · Rehabilitation Medicine
Active 1941–2026
About
Tessa Sundaram Cook, MD PhD FSIIM FCPP FAAR FACR, is an Associate Professor of Radiology at the Hospital of the University of Pennsylvania. She is an active member of the medical staff at Penn Presbyterian Medical Center, The Chester County Hospital, Pennsylvania Hospital, and Penn Medicine Princeton Health. Dr. Cook serves as an Attending Radiologist at the Hospital of the University of Pennsylvania and holds the position of Chief of 3-D and Advanced Imaging in the Department of Radiology. She is also the Director of the Imaging Informatics Fellowship at the University of Pennsylvania and Vice Chair of Practice Transformation in the same department. Her clinical expertise includes cardiac CT and MRI, vascular CTA and MRA, thoracic and abdominal CT, radiography, and DXA. Her research focuses on imaging informatics, workflow optimization, artificial intelligence, follow-up monitoring, innovation and practice transformation, patient-centered care, and clinical informatics. Dr. Cook has contributed to the field through her involvement in various research projects and publications, emphasizing advancements in medical imaging and informatics.
Research topics
- Computer Science
- Political Science
- Artificial Intelligence
- Data science
- Medical emergency
- Medicine
- Psychology
Selected publications
ArXiv.org · 2026-05-10
articleOpen accessSenior authorTransparent and standardized reporting is essential for reproducible scientific research, yet adherence to reporting guidelines remains inconsistent because of the manual effort required to select and complete checklists. We present CheckSupport, an open-source, locally deployable system that uses large language models to automate the recommendation of reporting checklists and the evidence-grounded completion of checklists for scientific manuscripts. CheckSupport employs a staged prompting strategy that decomposes reporting workflows into constrained inference tasks, prioritizing faithful extraction over generative text synthesis. All inference is performed locally using instruction-tuned models, preserving data privacy and enabling reproducible, auditable workflows. Evaluated on a corpus of peer-reviewed manuscripts, CheckSupport achieved 90% overall accuracy for checklist recommendations and 88% overall accuracy for item-level completion while operating on CPU-only hardware. On average, the wall-clock time per manuscript was 12.5 seconds, including the checklist recommendation and full checklist completion. These results demonstrate that large language models, when applied as structured inference components, can reduce reporting burden and support more transparent and reproducible scientific reporting across disciplines.
Positive act of reporting negative results in large language model research: a call for transparency
Journal of the American Medical Informatics Association · 2026-01-16
articleOpen accessSenior authorPURPOSE: To highlight the importance of reporting negative results in large language model (LLM) research, particularly as these systems are increasingly integrated into healthcare. POTENTIAL: LLMs offer transformative capabilities in text generation, summarization, and clinical decision support. Transparent documentation of both successes and failures can accelerate innovation, improve reproducibility, and guide safe deployment. CAUTION: Publication bias toward positive findings conceals model limitations, biases, and reproducibility challenges. In healthcare, underreporting failures risks patient safety, ethical lapses, and wasted resources. Structural barriers, including a lack of standards and limited funding for failure analysis, perpetuate this cycle. CONCLUSIONS: Negative results should be recognized as valuable contributions that delineate the boundaries of LLM applicability. Structured reporting, educational initiatives, and stronger incentives for transparency are essential to ensure responsible, equitable, and trustworthy use of LLMs in healthcare.
ArXiv.org · 2026-01-21
articleOpen accessMultimodal large language models have demonstrated comparable performance to that of radiology trainees on multiple-choice board-style exams. However, to develop clinically useful multimodal LLM tools, high-quality benchmarks curated by domain experts are essential. To curate released and holdout datasets of 100 chest radiographic studies each and propose an artificial intelligence (AI)-assisted expert labeling procedure to allow radiologists to label studies more efficiently. A total of 13,735 deidentified chest radiographs and their corresponding reports from the MIDRC were used. GPT-4o extracted abnormal findings from the reports, which were then mapped to 12 benchmark labels with a locally hosted LLM (Phi-4-Reasoning). From these studies, 1,000 were sampled on the basis of the AI-suggested benchmark labels for expert review; the sampling algorithm ensured that the selected studies were clinically relevant and captured a range of difficulty levels. Seventeen chest radiologists participated, and they marked "Agree all", "Agree mostly" or "Disagree" to indicate their assessment of the correctness of the LLM suggested labels. Each chest radiograph was evaluated by three experts. Of these, at least two radiologists selected "Agree All" for 381 radiographs. From this set, 200 were selected, prioritizing those with less common or multiple finding labels, and divided into 100 released radiographs and 100 reserved as the holdout dataset. The holdout dataset is used exclusively by RSNA to independently evaluate different models. A benchmark of 200 chest radiographic studies with 12 benchmark labels was created and made publicly available https://imaging.rsna.org, with each chest radiograph verified by three radiologists. In addition, an AI-assisted labeling procedure was developed to help radiologists label at scale, minimize unnecessary omissions, and support a semicollaborative environment.
arXiv (Cornell University) · 2026-01-21
preprintOpen accessMultimodal large language models have demonstrated comparable performance to that of radiology trainees on multiple-choice board-style exams. However, to develop clinically useful multimodal LLM tools, high-quality benchmarks curated by domain experts are essential. To curate released and holdout datasets of 100 chest radiographic studies each and propose an artificial intelligence (AI)-assisted expert labeling procedure to allow radiologists to label studies more efficiently. A total of 13,735 deidentified chest radiographs and their corresponding reports from the MIDRC were used. GPT-4o extracted abnormal findings from the reports, which were then mapped to 12 benchmark labels with a locally hosted LLM (Phi-4-Reasoning). From these studies, 1,000 were sampled on the basis of the AI-suggested benchmark labels for expert review; the sampling algorithm ensured that the selected studies were clinically relevant and captured a range of difficulty levels. Seventeen chest radiologists participated, and they marked "Agree all", "Agree mostly" or "Disagree" to indicate their assessment of the correctness of the LLM suggested labels. Each chest radiograph was evaluated by three experts. Of these, at least two radiologists selected "Agree All" for 381 radiographs. From this set, 200 were selected, prioritizing those with less common or multiple finding labels, and divided into 100 released radiographs and 100 reserved as the holdout dataset. The holdout dataset is used exclusively by RSNA to independently evaluate different models. A benchmark of 200 chest radiographic studies with 12 benchmark labels was created and made publicly available https://imaging.rsna.org, with each chest radiograph verified by three radiologists. In addition, an AI-assisted labeling procedure was developed to help radiologists label at scale, minimize unnecessary omissions, and support a semicollaborative environment.
Radiology Artificial Intelligence · 2026-03-18
articleA transformer-based framework integrating longitudinal multimodal medical data from chest radiographs and CT images achieved robust performance in clinical outcome prediction in patients with COVID-19.
Episode Charges and Subsequent Visits After Telemedicine vs In-Person Care
JAMA Network Open · 2026-02-09
articleOpen accessImportance: Telemedicine use increased during the COVID-19 pandemic and has remained a regular component of health care delivery. However, the financial implications of this change for health systems' reimbursement and utilization remain unclear. Objective: To compare 30-day episode charges and subsequent visits after telemedicine and in-person index visits. Design, Setting, and Participants: The target trial emulation conducted in this comparative effectiveness research included ambulatory in-person and telemedicine visit data from an academic health system comprising 5 hospitals in Pennsylvania from January 1 to April 30, 2024. Analyses focused on 10 high-volume clinical conditions commonly managed through telemedicine. Exposures: Telemedicine visits vs in-person visits. Main Outcomes and Measures: Outcomes included episode charges (the billed amount submitted for reimbursement to insurers and patients, excluding physician professional and facility fees for the index encounter) in an episode window from 7 days before to 30 days after the index visit and the number of subsequent visits within the episode window. Linear regression and Poisson regression with propensity score matching were conducted to adjust for demographic, clinical, socioeconomic, and contextual factors. Results: A total of 163 308 visits (108 383 [66.4%] among females; mean [SD] patient age, 49.2 [19.1] years) were included in this study. After propensity score matching, the mean 30-day episode charge was $96.60 (95% CI, $92.24-$100.96) for telemedicine encounters and $509.21 (95% CI, $500.65-$517.77) for in-person encounters (mean difference, $412.62; 95% CI, $403.01-$422.22). Additionally, telemedicine visits were associated with fewer follow-up visits per 30-day episode than were in-person visits (mean [SD], 3.44 [5.38] vs 4.44 [7.41] visits; comparative reduction, 23% [95% CI, 20%-26%]). For mental and behavioral disorders, 3 categories-depressive disorders (-$69.47; 95% CI, -$100.90 to -$38.04), anxiety and fear-related disorders ($38.06; 95% CI, $23.14 to $52.99), and neurodevelopmental disorders (-$28.88; 95% CI, -$54.72 to -$3.04)-exhibited comparable episode charges for telemedicine vs in-person encounters. Conclusions and Relevance: In this comparative effectiveness research using target trial emulation of outpatient telemedicine and in-person visits, telemedicine visits overall were associated with lower charges and fewer subsequent visits within the 30-day episode than were in-person visits. For mental and behavioral conditions, charges were comparable. These findings suggest that telemedicine may serve as a lower-charge alternative to in-person care without increasing the need for subsequent visits.
Current Topics in Learning and Development in Radiology: <i>AJR</i> Expert Panel Narrative Review
American Journal of Roentgenology · 2026-04-29
articleRadiology learning and development-encompassing both trainee education and lifelong professional learning-are undergoing rapid transformation driven by sustained growth in imaging volumes, persistent workforce constraints, increasing subspecialization, and evolving hybrid practice models that are reshaping radiologists' work. The resulting pressures increasingly and directly compete with protected time for case-based teaching, mentorship, feedback, and deliberate practice, placing the entire professional development cycle from training through practice at risk. Meanwhile, virtual and asynchronous educational platforms are expanding access to teaching conferences, case libraries, mobile resources, microlearning opportunities, and global collaboration while introducing challenges related to engagement, curation, feedback, and procedural skill expertise. Peer learning programs are also gaining momentum as an alternative to traditional score-based peer review, offering a structured nonpunitive approach to shared learning and practice improvement embedded within routine clinical work. In parallel, generative AI and large language models are creating new opportunities for knowledge synthesis and educational support while raising important concerns regarding accuracy, bias, outdated information, and overreliance. In this AJR Expert Panel Narrative Review, we examine how these converging trends are reshaping radiology learning and development and provide consensus guidance for strengthening educational quality while sustaining diagnostic excellence and professional growth in contemporary practice.
Radiology · 2026-02-01 · 1 citations
articlearXiv (Cornell University) · 2026-05-10
preprintOpen accessSenior authorTransparent and standardized reporting is essential for reproducible scientific research, yet adherence to reporting guidelines remains inconsistent because of the manual effort required to select and complete checklists. We present CheckSupport, an open-source, locally deployable system that uses large language models to automate the recommendation of reporting checklists and the evidence-grounded completion of checklists for scientific manuscripts. CheckSupport employs a staged prompting strategy that decomposes reporting workflows into constrained inference tasks, prioritizing faithful extraction over generative text synthesis. All inference is performed locally using instruction-tuned models, preserving data privacy and enabling reproducible, auditable workflows. Evaluated on a corpus of peer-reviewed manuscripts, CheckSupport achieved 90% overall accuracy for checklist recommendations and 88% overall accuracy for item-level completion while operating on CPU-only hardware. On average, the wall-clock time per manuscript was 12.5 seconds, including the checklist recommendation and full checklist completion. These results demonstrate that large language models, when applied as structured inference components, can reduce reporting burden and support more transparent and reproducible scientific reporting across disciplines.
Academic Radiology · 2025-12-12 · 1 citations
article
Frequent coauthors
- 44 shared
Hanna M. Zafar
University of Pennsylvania Health System
- 36 shared
Florence X. Doo
University of Maryland, Baltimore County
- 35 shared
Judy Wawira Gichoya
Emory University
- 27 shared
Ameena Elahi
University of Pennsylvania
- 26 shared
Curtis P. Langlotz
Palo Alto University
- 26 shared
Kate Hanneman
University of Toronto
- 25 shared
Marie Larochelle
Université Laval
- 25 shared
Linda Moy
Awards & honors
- Fellow, Society for Imaging Informatics in Medicine (2019)
- Fellow, American College of Radiology (FAAR)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Tessa Sundaram Cook
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup