
Rosa Arriaga
· Associate ProfessorVerifiedGeorgia Institute of Technology · Computer Science
Active 1996–2026
About
Dr. Rosa Arriaga is a Human Computer Interaction (HCI) researcher in the School of Interactive Computing at Georgia Tech. She uses psychological concepts, theories, and methods to address fundamental topics of HCI and Social Computing. Her current research interests are in the area of chronic care management and mental health. She designs mHealth systems that address gaps in chronic care and mental health management. The computational systems she designs foster engagement, facilitate continuity of care, promote patient self-advocacy, and mediate communication between patients and healthcare providers.
Research topics
- Computer Science
- Psychology
- Artificial Intelligence
- Applied psychology
- Geography
- Multimedia
- Cognitive science
- Human–computer interaction
Selected publications
Designing with Medical Mistrust: Perspectives from Black Older Adults in Publicly Subsidized Housing
arXiv (Cornell University) · 2026-03-03
preprintOpen accessSenior authorDespite increasing interest in culturally-sensitive health technologies, medical mistrust remains largely unexplored within human-centered computing. Considered a social determinant of health, medical mistrust is the belief that healthcare providers or institutions are acting against one's best interest. This is a rational, protective response based on historical context, structural inequities, and discrimination. To center race-based medical mistrust and the lived experiences of Black older adults with low income, we conducted interviews within publicly subsidized housing in the Southern United States. Our reflexive themes describe community perspectives on health care and medical mistrust, including accreditation and embodiment, skepticism of financial motivations, and the intentions behind health AI. We provide a reflective exercise for researchers to consider their positionality in relation to community engagements, and reframe our findings through Black Feminist Thought to propose design principles for health self-management technologies for communities with historically grounded medical mistrust.
arXiv (Cornell University) · 2026-03-03
preprintOpen accessSenior authorPsychotherapy delivery relies on a negotiation between patient self-reports and clinical intuition. Growing evidence for technological support of psychotherapy suggests opportunities to aid the mediation of this tension. To explore this prospect, we designed a prototype of a clinical decision support system (CDSS) for treating veterans with post-traumatic stress disorder in a Prolonged Exposure (PE) therapy intensive outpatient program. We conducted a two-phase interview study to collect perspectives from practicing PE clinicians and former PE patients who are United States veterans. Our analysis distills opportunities for a CDSS (e.g., offering homework review at a glance, aiding patient conceptualization) and larger challenges related to context and deployment (e.g., navigating Veterans Affairs). By reframing our findings through three human-centered perspectives (distributed cognition, situated learning, infrastructural inversion), we highlight the complexities of designing a CDSS for psychotherapists in this context and offer theory-aligned design considerations.
Designing with Medical Mistrust: Perspectives from Black Older Adults in Publicly Subsidized Housing
2026-04-13 · 1 citations
articleOpen accessSenior authorDespite increasing interest in culturally-sensitive health technologies, medical mistrust remains largely unexplored within human-centered computing. Considered a social determinant of health, medical mistrust is the belief that healthcare providers or institutions are acting against one’s best interest. This is a rational, protective response based on historical context, structural inequities, and discrimination. To center race-based medical mistrust and the lived experiences of Black older adults with low income, we conducted interviews within publicly subsidized housing in the Southern United States. Our reflexive themes describe community perspectives on health care and medical mistrust, including accreditation and embodiment, skepticism of financial motivations, and the intentions behind health AI. We provide a reflective exercise for researchers to consider their positionality in relation to community engagements, and reframe our findings through Black Feminist Thought to propose design principles for health self-management technologies for communities with historically grounded medical mistrust.
2026-04-21
articleOpen accessProlonged Exposure (PE) therapy is an effective treatment for post-traumatic stress disorder (PTSD), but evaluating therapist fidelity remains labor-intensive due to the need for manual review of session recordings. We present a method for the automatic temporal localization of key PE fidelity elements, identifying their start and stop times, directly from session audio and transcripts. Our approach fine-tunes a large pre-trained audio-language model, Qwen2-Audio, using Low-Rank Adaptation (LoRA) to process focused 30-second windows of audio-transcript input. Fidelity labels for three core protocol phases, therapist orientation (P1), imaginal exposure (P2), and post-imaginal processing (P3), are generated via LLM-based prompting and verified by trained raters. The model is trained to predict normalized boundary offsets using soft supervision guided by task-specific prompts. On a dataset of 308 real PE sessions, our best configuration (LoRA rank 8, 30s windows) achieves a mean absolute error (MAE) of 5.3s across tasks, within typical rater tolerance for timestamp review, enabling practical fidelity QC. We further analyze the effects of window size and LoRA rank, highlighting the importance of context granularity and model adaptation. This work introduces a privacy-preserving, scalable framework for fidelity tracking in PE therapy, with potential to support clinician training, supervision, and quality assurance.
AI Safety Training Can be Clinically Harmful
arXiv (Cornell University) · 2026-04-25
preprintOpen accessLarge language models are being deployed as mental health support agents at scale, yet only 16% of LLM-based chatbot interventions have undergone rigorous clinical efficacy testing, and simulations reveal psychological deterioration in over one-third of cases. We evaluate four generative models on 250 Prolonged Exposure (PE) therapy scenarios and 146 CBT cognitive restructuring exercises (plus 29 severity-escalated variants), scored by a three-judge LLM panel. All models scored near-perfectly on surface acknowledgment (~0.91-1.00) while therapeutic appropriateness collapsed to 0.22-0.33 at the highest severity for three of four models, with protocol fidelity reaching zero for two. Under CBT severity escalation, one model's task completeness dropped from 92% to 71% while the frontier model's safety-interference score fell from 0.99 to 0.61. We identify a systematic, modality-spanning failure: RLHF safety alignment disrupts the therapeutic mechanism of action by grounding patients during imaginal exposure, offering false reassurance, inserting crisis resources into controlled exercises, and refusing to challenge distorted cognitions mentioning self-harm in PE; and through task abandonment or safety-preamble insertion during CBT cognitive restructuring. These findings motivate a five-axis evaluation framework (protocol fidelity, hallucination risk, behavioral consistency, crisis safety, demographic robustness), mapped onto FDA SaMD and EU AI Act requirements. We argue that no AI mental health system should proceed to deployment without passing multi-axis evaluation across all five dimensions.
JMIR Human Factors · 2026-01-03 · 1 citations
articleOpen accessSenior authorBackground: Digital health tools are increasingly prevalent in postoperative care management, yet limited research exists on digital health literacy and tool adoption among safety-net hospital populations. Understanding these factors is crucial for developing effective digital health solutions for historically underserved communities. Objective: This study aimed to evaluate digital health literacy, assess technology adoption readiness, and examine the relationship between patient-reported capabilities and demographic factors in a postoperative care context at a safety-net hospital. Methods: We conducted a mixed methods study with 71 postoperative patients and 29 health care providers at a safety-net hospital. Participants completed a modified eHealth Literacy Scale (eHEALS) assessment and a demographic questionnaire, followed by usability testing of PocketDoc, a digital health prototype. The modified 7-item eHEALS demonstrated adequate internal consistency (Cronbach α=0.77). Qualitative data from think-aloud protocols during usability testing were collected for future analysis. This study focused on quantitative assessments of digital health literacy (using the modified eHEALS on a 5-point Likert scale) and technology adoption readiness (via usability metrics on a 10-point Likert scale) analyzed using nonparametric statistical tests. Correlations between demographic factors and digital health literacy were examined using Spearman rank-order correlation. Results: Despite common assumptions about technology barriers in safety-net populations, 69% (49/71) of patients reported high confidence (score of ≥3 on a 5-point scale) in finding health resources online, and 61% (43/71) expressed confidence in using the internet for health-related questions. However, only 49% (35/71) felt confident in using digital resources for health decision-making. Digital health literacy scores did not correlate with age or educational level, although 79% (56/71) of patients reported ≥10 years of digital device experience. Both patients and health care providers rated PocketDoc highly for ease of use (median 10, IQR 8-10) and task intuitiveness (median 10, IQR 8-10). Patients' confidence in finding and using health resources online positively correlated with interface satisfaction (ρ=0.262-0.304 and ρ=0.010-0.027, respectively). Conclusions: Our exploratory findings from 100 participants suggest that digital health tools may be more feasible in safety-net settings than previously considered, although the sample size and single-site design limit generalizability. However, the gap between patients' ability to find health resources (49/71, 69% confident) and their confidence in using these resources for health decision-making (35/71, 49% confident) highlights the need for targeted support in translating digital capabilities to health management skills.
2026-04-13 · 1 citations
articleOpen accessSenior authorPsychotherapy delivery relies on a negotiation between patient self-reports and clinical intuition. Growing evidence for technological support of psychotherapy suggests opportunities to aid the mediation of this tension. To explore this prospect, we designed a prototype of a clinical decision support system (CDSS) for treating veterans with post-traumatic stress disorder in a Prolonged Exposure (PE) therapy intensive outpatient program. We conducted a two-phase interview study to collect perspectives from practicing PE clinicians and former PE patients who are United States veterans. Our analysis distills opportunities for a CDSS (e.g., offering homework review at a glance, aiding patient conceptualization) and larger challenges related to context and deployment (e.g., navigating Veterans Affairs). By reframing our findings through three human-centered perspectives (distributed cognition, situated learning, infrastructural inversion), we highlight the complexities of designing a CDSS for psychotherapists in this context and offer theory-aligned design considerations.
AI Safety Training Can be Clinically Harmful
ArXiv.org · 2026-04-25
articleOpen accessLarge language models are being deployed as mental health support agents at scale, yet only 16% of LLM-based chatbot interventions have undergone rigorous clinical efficacy testing, and simulations reveal psychological deterioration in over one-third of cases. We evaluate four generative models on 250 Prolonged Exposure (PE) therapy scenarios and 146 CBT cognitive restructuring exercises (plus 29 severity-escalated variants), scored by a three-judge LLM panel. All models scored near-perfectly on surface acknowledgment (~0.91-1.00) while therapeutic appropriateness collapsed to 0.22-0.33 at the highest severity for three of four models, with protocol fidelity reaching zero for two. Under CBT severity escalation, one model's task completeness dropped from 92% to 71% while the frontier model's safety-interference score fell from 0.99 to 0.61. We identify a systematic, modality-spanning failure: RLHF safety alignment disrupts the therapeutic mechanism of action by grounding patients during imaginal exposure, offering false reassurance, inserting crisis resources into controlled exercises, and refusing to challenge distorted cognitions mentioning self-harm in PE; and through task abandonment or safety-preamble insertion during CBT cognitive restructuring. These findings motivate a five-axis evaluation framework (protocol fidelity, hallucination risk, behavioral consistency, crisis safety, demographic robustness), mapped onto FDA SaMD and EU AI Act requirements. We argue that no AI mental health system should proceed to deployment without passing multi-axis evaluation across all five dimensions.
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies · 2025-12-02
articleSenior authorImagined visions of the future underlie much of the history of ubiquitous computing. Critiques of speculative design approaches, however, amplify concerns of ubicomp's inherent focus on the future, inciting questions of who is futuring and how to amplify historically marginalized voices. Taking a research through design approach, we conducted participatory speculative design workshops focusing on diabetes self-monitoring technology, within under-community sites which predominantly serve Black older adults. We explore community member perspectives on three modalities of ubiquitous health technologies: smarthome, wearable, and smartphone application. While much previous research focuses on diabetes technologies, we explore an area which is currently understudied by the ubicomp community: diabetic foot disease monitoring. Further, we center a community which faces greater diabetes health disparities. We provide findings related to health priorities and values of community members, and their broader considerations regarding current and speculative (AI-based) technologies. We reflect on the tensions between current clinical standards of care and the participant agency afforded by participatory design. Finally, we discuss the ways in which participants' views of current and speculative technologies contrast with ubicomp's as a field, specifically surrounding temporality and sociohistorical context.
2025-10-26
articleSenior authorDiabetic foot ulcers, a life-threatening complication of diabetes, take a disproportionate toll on communities of color; however, these communities are currently underrepresented in dermatologic and wound image datasets. Further, many of these datasets were collected under controlled conditions, limiting the transferability of ulcer recognition models to naturalistic settings. In support of more equitable and generalizable computational modeling, we detail our two-year effort to create the first repository of diabetic foot ulcer images collected predominantly from patients of color in naturalistic settings. We conduct an evaluation of state-of-the-art foot ulcer segmentation and classification methods using our dataset of 3,362 foot images collected from 252 patients, and provide evidence that current ulcer recognition models result in insufficient performance: the best performing baseline model (Mask R-CNN) has been previously reported to achieve a Dice score of 90.2%, but achieves only 39.5% on our more naturalistic dataset from patients of color. We propose and evaluate a new pipeline which improves segmentation performance, including an ulcer detection model and a foundational segmentation model (Segment Anything 2 UNet) tailored to communities of color and specifically aiming for naturalistic assessment scenarios. We release our image dataset to support the development of larger, more diverse datasets, and ultimately more equitable models for diabetic foot care.
Recent grants
SCH: INT:Prolonged Exposure Collective Sensing System (PECSS) for PTSD
NSF · $1.2M · 2019–2025
Frequent coauthors
- 34 shared
Gregory D. Abowd
Northeastern University
- 25 shared
Keiichi Yasumoto
Nara Institute of Science and Technology
- 11 shared
Hwajung Hong
- 10 shared
U. K. Lakshmi
- 9 shared
Jennifer Mankoff
University of Washington
- 8 shared
Fatima A. Boujarwah
Kuwait University
- 7 shared
Tae-Jung Yun
Samsung (South Korea)
- 7 shared
Hayley I. Evans
Georgia Institute of Technology
Education
- 1992
Ph.D., Computer Science
Massachusetts Institute of Technology
- 1988
M.S., Computer Science
Massachusetts Institute of Technology
- 1985
B.S., Computer Science
University of Texas at Austin
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Rosa Arriaga
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup