
Mauricio Santillana
VerifiedNortheastern University · Electrical and Energy Engineering
Active 1975–2026
About
Mauricio Santillana, PhD, MSc, is a professor in the Physics and Electrical and Computer Engineering Departments at Northeastern University and the director of the Machine Intelligence Research Lab in the Network Science Institute. His research areas include modeling geographic patterns of population growth, modeling fluid flow to inform coastal floods simulations, atmospheric global pollution transport models, and the design and implementation of disease outbreaks prediction platforms. His work has demonstrated that machine learning techniques can effectively monitor and predict disease outbreak dynamics using novel data sources such as Internet search activity, social media posts, clinician searches, human mobility, and weather data. His research has been published in prominent journals including Nature, Science, Proceedings of the National Academy of Sciences, Science Advances, Nature Communications, and Nature Climate Change. His research has received funding from organizations such as the National Institute of General Medical Sciences (NIH), the U.S. Centers for Disease Control and Prevention, the Bill and Melinda Gates Foundation, and other foundations. Santillana's work involves developing machine intelligence analytics tools aimed at predicting unobserved events in epidemiology and healthcare, tracking disease outbreaks globally, and exploring the influence of climate change and socio-economic factors on health outcomes.
Research topics
- Medicine
- Sociology
- Political Science
- Internal medicine
- Environmental health
- Virology
- Nursing
- Computer Science
- Demography
- Geography
- Psychiatry
- Psychology
- Pediatrics
- Law
- Economics
- Data science
- Pathology
- Gender studies
- Business
- Social psychology
- Biology
- Internet privacy
- Socioeconomics
- Public relations
Selected publications
Generative AI Use and Depressive Symptoms Among US Adults
JAMA Network Open · 2026-01-21 · 3 citations
articleOpen accessImportance: Generative artificial intelligence (AI) has rapidly entered mainstream use in the US, but its association with mental health has not been characterized. Objective: To examine the associations of the extent and type of generative AI use among US adults with negative affective symptoms in a large, nationally representative sample. Design, Setting, and Participants: This survey study used data from a 50-state US internet nonprobability survey conducted between April and May 2025. Survey respondents were aged 18 years and older. Data were analyzed in August 2025. Exposure: Participants self-reported generative AI and social media use. Main Outcomes and Measures: The outcome of interest, negative affect, was measured using the Patient Health Questionnaire 9-item (PHQ-9). Results: There were 20 847 unique participants, with mean (SD) age 47.3 (17.1) years and 10 327 (49.5%) female, 10 386 (49.8%) male, and 134 (0.6%) nonbinary participants; 2152 participants (10.3%) reported using AI at least daily, including 1053 participants (5.1%) who reported daily use and 1099 participants (5.3%) who reported use multiple times per day. Among participants who used daily or more frequently, 1033 (48.0%) reported use for work, 246 (11.4%) for school, and 1875 (87.1%) for personal applications. In survey-weighted regression models, daily or more frequent AI use was significantly more common among men, younger adults, those with higher education and income, and those in urban settings. Greater AI use was associated with greater levels of depressive symptoms in sociodemographic-adjusted regression models: (daily use: β = 1.08 [95% CI, 0.55-1.62]; multiple times per day: β = 0.86 [95% CI, 0.35-1.37]) compared with nonuse, and with greater likelihood of reporting at least moderate depressive symptoms (odds ratio [OR], 1.29 [95% CI, 1.15-1.46]); similar patterns were observed for anxiety and irritability. The highest estimates were observed among individuals using AI for personal use (β = 0.31 [95% CI, 0.10-0.52]) and those aged 25 to 44 years (β = 1.22 [95% CI, 0.70-1.74]) or 45 to 64 years (β = 1.38 [95% CI, 0.72-2.05]). Conclusions and Relevance: This survey study found that AI use was significantly associated with greater depressive symptoms, with magnitude of differences varying by age group. Further work is needed to understand whether these associations are causal and explain heterogeneous effects.
Antidepressant use among American adults in a 50-state survey
BMJ Mental Health · 2026-01-01
articleOpen accessBACKGROUND: Antidepressants are among the most prescribed medications in the USA, yet challenges in access to mental health treatment persist. OBJECTIVE: To assess current and lifetime antidepressant and psychotherapy use among American adults, and examine attitudes towards potential federal restrictions on antidepressant prescribing. METHODS: We conducted a cross-sectional survey study using data from a national non-probability internet-based panel weighted to approximate national demographics (age, gender, race and ethnicity, education, US census region, and urbanicity) based on 2020 US Census data. Data were collected between 10 April and 27 May 2025 from 30 810 adults residing in the USA. The primary outcomes were self-reported current and past antidepressant and psychotherapy use, and support for or opposition to potential federal restrictions on antidepressant prescribing. Logistic regression models estimated demographic and treatment-related features associated with these outcomes. FINDINGS: Among 30 115 respondents with complete antidepressant data, 16.6% reported current antidepressant use, and of 30 098 respondents with psychotherapy data, 10.4% reported current psychotherapy. Use of both treatments was significantly greater among White respondents compared with all other racial groups. When asked about potential federal restrictions on doctors prescribing antidepressants, 16.4% of respondents supported and 48.0% opposed such regulation, with lesser opposition among those of male gender (OR 0.69, 95% CI 0.65 to 0.73), and greater opposition among those with lifetime antidepressant treatment (OR 2.37, 95% CI 2.21 to 2.54). CONCLUSIONS: Antidepressant and psychotherapy use remains unevenly distributed across demographic groups. A significant proportion of adults in every US state oppose efforts to restrict access to antidepressant prescribing, reflecting broad public support for maintaining access to treatment. CLINICAL IMPLICATIONS: Findings from this study suggest that restrictive policies on antidepressant prescribing are unlikely to align with public sentiment and may risk exacerbating existing inequities in care.
Journal of Mood and Anxiety Disorders · 2026-01-13 · 1 citations
articleOpen accessLarge and persistent sociodemographic disparities in rates of mental health treatment in the United States have been reported, but whether these differences reflect institutional mistrust or limited social support remains unclear. This study described current treatment use among American adults with moderate-to-severe depressive or anxiety symptoms and examined whether trust in health care institutions and availability of emotional support were associated with lack of treatment. A cross-sectional analysis was conducted using data from a nationally distributed, web-based opinion survey of 9733 American adults with moderate-to-severe depressive or anxiety symptoms (Patient Health Questionnaire-9 score ≥10 and/or Generalized Anxiety Disorder-2 score ≥3). The survey was fielded April 10th-28th, 2025, using quota sampling for age, gender, race, ethnicity, education, U.S. census region, and urbanicity; post-stratification weights approximated the U.S. adult population. The primary outcome was no current mental health treatment (neither antidepressant nor psychotherapy use). Weighted logistic regression estimated odds ratios for treatment absence by sociodemographic characteristics, trust in physicians and hospitals, scientists and researchers, the Centers for Disease Control and Prevention, pharmaceutical companies, and emotional support. Among 9733 adults with elevated symptoms, 66.3 % reported no current treatment. Racial and ethnic minority groups, men, and those born outside the United States had higher odds of being untreated, while public insurance predicted lower odds. Lower trust in doctors and hospitals, lower trust in science, and lack of emotional support each independently predicted treatment absence, but inclusion of these variables did not meaningfully attenuate sociodemographic disparities.
medRxiv · 2026-05-19
articleOpen accessSenior authorAbstract Wastewater-based surveillance (WBS) is increasingly used to monitor infectious disease dynamics, yet most evaluations focus on correlation or forecasting—neither of which directly assesses whether wastewater signals can identify the epidemiological events most relevant to public health decision-making. We argue that outbreak onset and epidemic peak detection are the operationally critical use cases of WBS, requiring a fundamentally different evaluation framework. We introduce a classification-based framework that treats WBS as an event-detection problem, defining outbreaks and peaks as discrete events, establishing detection intervals to account for timing uncertainty, and incorporating censoring and data completeness criteria for valid comparisons against imperfect clinical reference outcomes. Within this framework, we apply a Bayesian exponential growth model for outbreak detection - benchmarked against a standard reproductive number (Rt)-based method - and a rule-based algorithm for peak detection, evaluating performance via sensitivity and positive predictive value (PPV). Applied to county-level SARS-CoV-2 wastewater data from 281 U.S. counties (Biobot, 2021–2024), the exponential growth approach substantially outperforms the Rt-based baseline: sensitivity 0.82 and PPV 0.64 versus sensitivity 0.58 and PPV 0.19 for the best-performing Rt variant. Peak detection achieves sensitivity 0.84 and PPV 0.70 at the county level. Both peak and outbreak detection achieve strong and consistent performance against hospitalizations and deaths at the state level. Spatial aggregation yields a statistically significant improvement in peak detection PPV against a curated reference standard ( p < 0.001), while outbreak detection improvements under aggregation are directionally consistent but not statistically significant. Wastewater leads case-defined outbreaks by 4–6 days but minimally leads epidemic peaks, consistent with wastewater approximating prevalence rather than incidence. These findings demonstrate that wastewater signals can reliably detect outbreak onset and epidemic peaks across spatial scales and clinical outcomes, and that the choice of detection method matters substantially in practice. The classification framework developed here provides a reusable and principled tool for evaluating any surveillance signal as an event-detection system, with direct relevance to how WBS is actually used in public health decision-making. Highlights We evaluate wastewater surveillance as an event-detection system for outbreak onset and epidemic peak timing. We introduce a classification-based framework that accounts for timing uncertainty, censoring, and data completeness. Wastewater signals detect case-defined outbreaks and peaks with strong sensitivity and positive predictive value across spatial scales. Peak and outbreak detection show modest gains under aggregation, particularly for noisier outcomes such as deaths. The proposed framework provides a reusable approach for evaluating surveillance signals against epidemiologically meaningful events.
Restoring the forecasting power of Google Trends with statistical preprocessing
International Journal of Forecasting · 2026-03-31 · 1 citations
articlebioRxiv (Cold Spring Harbor Laboratory) · 2026-05-01
articleOpen accessSenior authorAbstract Vaccine strain selection for seasonal influenza A(H3N2) depends on knowing which hemagglutinin (HA) substitutions are most likely to erode neutralizing antibody recognition, yet published antigenic site sets disagree substantially on which positions matter most. We applied interpretable gradient-boosted tree models with SHAP-based site attribution to two complementary hemagglutination inhibition (HI) datasets to produce a more consolidated ranking of candidate antigenic positions. Models trained on a Neher/Bedford benchmark dataset recover the canonical cluster-transition sites established by prior analyses. Moreover, after filtering the WIC dataset for confounding factors, our models recover the majority of positions from four major prior reference sets (Koel, Neher/Bedford, Harvey, and Shah) and improve concordance between rankings derived from the Neher/Bedford and WIC datasets. Rankings from our models also agree more strongly with models trained to predict sampling time or passage identity than with standard evolutionary metrics used to detect diversifying selection. Our results show that interpretable sequence-based models can provide a more integrative ranking of candidate antigenic positions across different data sources and modeling approaches. This work should aid efforts to prioritize H3N2 substitutions for epidemic surveillance. Significance Statement Every year, health authorities must update the seasonal flu vaccine to account for mutations in influenza A(H3N2) that allow the virus to escape existing immunity. Knowing which specific positions in the hemagglutinin protein drive this immune escape is essential for evaluating newly emerging variants, but published studies disagree substantially on which positions matter most. We show that interpretable machine learning models applied to two hemagglutination inhibition datasets, the Neher/Bedford benchmark dataset and the larger WHO Collaborating Centre dataset, can help to resolve the disagreements. The models recover canonical cluster-transition sites from the Neher/Bedford benchmark data, and show that our analysis approach with the WIC data improves concordance across several prior rankings produced from distinct datasets and modeling approaches. The resulting rankings provide a practical, consolidated reference for prioritizing hemagglutinin mutations most likely to affect vaccine effectiveness.
The Lancet Regional Health - Americas · 2026-01-01
articleOpen accessSenior authorBackground: Influenza and respiratory syncytial virus (RSV) are major contributors to the burden of seasonal influenza-like illnesses (ILI) in the US. The prevention and treatment of ILI varies substantially across age groups and in cost and administration schedule. This study aimed to characterize the timing and ordering of RSV, influenza, and COVID-19 epidemics in the post-pandemic period to inform public health preparedness. Methods: We implemented a series of independent regression models to infer the contribution of each of these diseases to seasonal ILI syndromic indicators. We further implemented anomaly-detection algorithms on data from the US Centers for Disease Control and Prevention National Syndromic Surveillance Program for the 2022-23, 2023-24, and 2024-25 ILI seasons to identify the timing of onsets and peaks of RSV, influenza, and COVID-19. Findings: A total of 148 state-ILI seasons were analyzed. In 114 out of 148 (77.0%) of analyzed seasons, volume of RSV emergency department (ED) visits peaked before influenza ED visits. The median time difference between peaks of RSV and peaks of influenza was +3.0 weeks (95% percentile range: -7.0, +7.0 weeks; interquartile range: 5.0 weeks). The timing of RSV and influenza onsets were found to occur more synchronously in the 2023-2024 and 2024-2025 ILI seasons. The timing of COVID-19 outbreaks did not show a consistent seasonal pattern across the study period. Interpretation: RSV epidemics frequently reach peak volume before influenza epidemics across the US. Healthcare professionals and public health authorities should anticipate increases in RSV cases and hospitalizations at the start of the annual ILI season and establish infrastructure and planning to handle incoming surges of both RSV and influenza appropriately. Funding: CDC Center for Forecasting and Outbreak Analytics; National Institutes of Health.
Journal of the Pediatric Infectious Diseases Society · 2026-05-06
articleSenior authorInfectious Disease Modelling · 2026-05-01
articleOpen accessSenior authorCorrespondingEpidemic models face a critical challenge: surveillance systems capture only a fraction of infections (often < 10%). We reveal two fundamental problems. First, when models ignore underdetection entirely—treating detected cases as complete—parameter errors exceed 1000% despite visually reasonable fits. Second, when models explicitly account for underdetection by including case detection ratios as unknown parameters, structural identifiability analysis proves transmission rates and detection ratios become mathematically confounded—rendering infinite epidemiologically distinct scenarios equally plausible from case data alone. Integrating even a single population-level seroprevalence measurement resolves both problems by independently constraining cumulative exposure. Through Bayesian inference on synthetic SIR data, we demonstrate that this approach reduces parameter uncertainty by orders of magnitude, enabling accurate inference of transmission dynamics, peak timing, and outbreak size under realistic noise. Our framework establishes serological surveillance integration as both a mathematical necessity and a strategic investment for pandemic preparedness.
The R = 1 threshold can misclassify epidemic stability
Communications Physics · 2026-04-15
articleOpen accessAbstract The effective reproduction number, R , is a predominant statistic for tracking infectious disease spread and informing health policies. An estimated R = 1 is universally interpreted as a stability threshold distinguishing epidemic growth ( R > 1 ) from control ( R < 1 ). We demonstrate that this interpretation frequently fails because R typically averages over groups with heterogeneous characteristics. We find that R = 1 conceals valuable early-warning signals of resurgence and misclassifies complex dynamics as noise, generating false positive stability thresholds that diminish predictive and policymaking value. We further illustrate that a popular alternative transmissibility definition (using next-generation matrices) overcorrects this issue, producing false negative stability signals by amplifying stochastic variation. We address these limitations by adapting a recently developed statistic, E , derived from R using experimental design theory. We show that E tightly constrains the set of scenarios consistent with stability, while remaining robust to noise and establish E = 1 as a more practical and meaningful real-time threshold.
Recent grants
Frequent coauthors
- 349 shared
Roy H. Perlis
- 319 shared
Matthew A. Baum
- 308 shared
Katherine Ognyanova
Rutgers Sexual and Reproductive Health and Rights
- 306 shared
James N. Druckman
University of Rochester
- 281 shared
David Lazer
Northeastern University
- 213 shared
Alexi Quintana
Northeastern University
- 187 shared
Jon Green
Duke University
- 183 shared
Jennifer Lin
Northwestern University
Awards & honors
- Stanford University Top 2% Most-Cited Scientists (2023, 2024…
- Atlas of Inspiring Hispanic/Latinx Scientists (2024)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Mauricio Santillana
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup