Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Patrick Flaherty

· Associate ProfessorVerified

University of Massachusetts Amherst · Mathematics and Statistics

Active 1984–2025

h-index26
Citations8.6k
Papers7621 last 5y
Funding$693k
See your match with Patrick Flaherty — sign in to PhdFit.Sign in

About

Patrick Flaherty is an Associate Professor in the Department of Mathematics and Statistics at the University of Massachusetts Amherst. His research focuses on developing statistical models and scalable algorithms to interpret massive biomedical data sets, particularly in the context of large-scale genomic data. His work aims to address the need for statistically rigorous and computationally efficient methods to analyze complex data generated by advances in DNA sequencing technology, with the goal of improving patient care. His research spans diverse fields including machine learning, bioinformatics, statistics, and genetics. Flaherty's long-term goal is to enable the interpretation of genomic changes that drive disease development through innovative statistical and computational approaches.

Research topics

  • Computer Science
  • Biology
  • Chemistry
  • Genetics
  • Data Mining
  • Internal medicine
  • Bioinformatics
  • Medicine
  • Computational biology
  • Biochemistry
  • Surgery
  • Programming language

Selected publications

  • Stress testing reveals selective vulnerabilities in protein homeostasis

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-06-16

    preprintOpen access

    Protein quality control (PQC) systems are essential for cellular resilience to proteotoxic stress. Despite intensive study for decades, functional redundancies in the system obscure the contributions of the collectively important individual genes. Here, we leverage transposon sequencing across bacteria strains lacking key chaperones and proteases to reveal hidden determinants of stress response in protein homeostasis. By profiling fitness under multiple proteotoxic stresses, we uncover stress-specific vulnerabilities and reveal how major players of PQC mask correlations between transcriptomic responses and gene fitness. As an illustration of unexpected connections, we identify a heat-specific synthetic lethality between the disaggregase ClpB and DNA Polymerase I (PolA) mediated by persistent aggregation of the RecA recombinase and toxic persistence of the heat shock regulon. Our findings reveal that stress-induced aggregation is not broadly toxic. Rather, it becomes lethal in specific genetic or environmental contexts due to the depletion of components only needed in those specific circumstances. This work presents a framework to reveal normally hidden fragility in stress responses using gene fitness scores adaptable to a variety of systems.

  • Development of a biomarker prediction model for post-trauma multiple organ failure/dysfunction syndrome based on the blood transcriptome

    Annals of Intensive Care · 2024-01-01 · 3 citations

    articleOpen access

    BACKGROUND: Multiple organ failure/dysfunction syndrome (MOF/MODS) is a major cause of mortality and morbidity among severe trauma patients. Current clinical practices entail monitoring physiological measurements and applying clinical score systems to diagnose its onset. Instead, we aimed to develop an early prediction model for MOF outcome evaluated soon after traumatic injury by performing machine learning analysis of genome-wide transcriptome data from blood samples drawn within 24 h of traumatic injury. We then compared its performance to baseline injury severity scores and detection of infections. METHODS: Buffy coat transcriptome and linked clinical datasets from blunt trauma patients from the Inflammation and the Host Response to Injury Study ("Glue Grant") multi-center cohort were used. According to the inclusion/exclusion criteria, 141 adult (age ≥ 16 years old) blunt trauma patients (excluding penetrating) with early buffy coat (≤ 24 h since trauma injury) samples were analyzed, with 58 MOF-cases and 83 non-cases. We applied the Least Absolute Shrinkage and Selection Operator (LASSO) and eXtreme Gradient Boosting (XGBoost) algorithms to select features and develop models for MOF early outcome prediction. RESULTS: The LASSO model included 18 transcripts (AUROC [95% CI]: 0.938 [0.890-0.987] (training) and 0.833 [0.699-0.967] (test)), and the XGBoost model included 41 transcripts (0.999 [0.997-1.000] (training) and 0.907 [0.816-0.998] (test)). There were 16 overlapping transcripts comparing the two panels (0.935 [0.884-0.985] (training) and 0.836 [0.703-0.968] (test)). The biomarker models notably outperformed models based on injury severity scores and sex, which we found to be significantly associated with MOF (APACHEII + sex-0.649 [0.537-0.762] (training) and 0.493 [0.301-0.685] (test); ISS + sex-0.630 [0.516-0.744] (training) and 0.482 [0.293-0.670] (test); NISS + sex-0.651 [0.540-0.763] (training) and 0.525 [0.335-0.714] (test)). CONCLUSIONS: The accurate assessment of MOF from blood samples immediately after trauma is expected to aid in improving clinical decision-making and may contribute to reduced morbidity, mortality and healthcare costs. Moreover, understanding the molecular mechanisms involving the transcripts identified as important for MOF prediction may eventually aid in developing novel interventions.

  • Doubly Non-Central Beta Matrix Factorization for Stable Dimensionality Reduction of Bounded Support Matrix Data

    arXiv (Cornell University) · 2024-10-24

    preprintOpen access

    We consider the problem of developing interpretable and computationally efficient matrix decomposition methods for matrices whose entries have bounded support. Such matrices are found in large-scale DNA methylation studies and many other settings. Our approach decomposes the data matrix into a Tucker representation wherein the number of columns in the constituent factor matrices is not constrained. We derive a computationally efficient sampling algorithm to solve for the Tucker decomposition. We evaluate the performance of our method using three criteria: predictability, computability, and stability. Empirical results show that our method has similar performance as other state-of-the-art approaches in terms of held-out prediction and computational complexity, but has significantly better performance in terms of stability to changes in hyper-parameters. The improved stability results in higher confidence in the results in applications where the constituent factors are used to generate and test scientific hypotheses such as DNA methylation analysis of cancer samples.

  • Discovering Genetic Modulators of the Protein Homeostasis System through Multilevel Analysis

    bioRxiv (Cold Spring Harbor Laboratory) · 2024-02-29 · 1 citations

    preprintOpen accessSenior authorCorresponding

    Every protein progresses through a natural lifecycle from birth to maturation to death; this process is coordinated by the protein homeostasis system. Environmental or physiological conditions trigger pathways that maintain the homeostasis of the proteome. An open question is how these pathways are modulated to respond to the many stresses that an organism encounters during its lifetime. To address this question, we tested how the fitness landscape changes in response to environmental and genetic perturbations using directed and massively parallel transposon mutagenesis in Caulobacter crescentus . We developed a general computational pipeline for the analysis of gene-by-environment interactions in transposon mutagenesis experiments. This pipeline uses a combination of general linear models (GLMs), statistical knockoffs, and a nonparametric Bayesian statistical model to identify essential genetic network components that are shared across environmental perturbations. This analysis allows us to quantify the similarity of proteotoxic environmental perturbations from the perspective of the fitness landscape. We find that essential genes vary more by genetic background than by environmental conditions, with limited overlap among mutant strains targeting different facets of the protein homeostasis system. We also identified 146 unique fitness determinants across different strains, with 19 genes common to at least two strains, showing varying resilience to proteotoxic stresses. Experiments exposing cells to a combination of genetic perturbations and dual environmental stressors show that perturbations that are quantitatively dissimilar from the perspective of the fitness landscape are likely to have a synergistic effect on the growth defect. Significance Statement This study provides critical insights into how cells adapt to environmental and genetic challenges affecting protein homeostasis. Using multilevel statistical analysis and transposon mutagenesis, we find that a model organism, Caulobacter crescentus , lacks a universal redundancy mechanism for coping with stress, as evidenced by the limited overlap in essential genes across different environmental and genetic perturbations. Our methods also pinpoint key fitness determinants and enable the prediction of perturbation combinations that synergistically affect cell growth.

  • Discovering genetic modulators of the protein homeostasis system through multilevel analysis

    PNAS Nexus · 2024-12-23

    articleOpen accessSenior author

    Abstract Every protein progresses through a natural lifecycle from birth to maturation to death; this process is coordinated by the protein homeostasis system. Environmental or physiological conditions trigger pathways that maintain the homeostasis of the proteome. An open question is how these pathways are modulated to respond to the many stresses that an organism encounters during its lifetime. To address this question, we tested how the fitness landscape changes in response to environmental and genetic perturbations using directed and massively parallel transposon mutagenesis in Caulobacter crescentus. We developed a general computational pipeline for the analysis of gene-by-environment interactions in transposon mutagenesis experiments. This pipeline uses a combination of general linear models, statistical knockoffs, and a nonparametric Bayesian statistical model to identify essential genetic network components that are shared across environmental perturbations. This analysis allows us to quantify the similarity of proteotoxic environmental perturbations from the perspective of the fitness landscape. We find that essential genes vary more by genetic background than by environmental conditions, with limited overlap among mutant strains targeting different facets of the protein homeostasis system. We also identified 146 unique fitness determinants across different strains, with 19 genes common to at least two strains, showing varying resilience to proteotoxic stresses. Experiments exposing cells to a combination of genetic perturbations and dual environmental stressors show that perturbations that are quantitatively dissimilar from the perspective of the fitness landscape are likely to have a synergistic effect on the growth defect.

  • A PREVENTIVE TOOL FOR PREDICTING BLOODSTREAM INFECTIONS IN CHILDREN WITH BURNS

    Shock · 2023-01-04 · 14 citations

    articleOpen access

    ABSTRACT: Introduction: Despite significant advances in pediatric burn care, bloodstream infections (BSIs) remain a compelling challenge during recovery. A personalized medicine approach for accurate prediction of BSIs before they occur would contribute to prevention efforts and improve patient outcomes. Methods: We analyzed the blood transcriptome of severely burned (total burn surface area [TBSA] ≥20%) patients in the multicenter Inflammation and Host Response to Injury ("Glue Grant") cohort. Our study included 82 pediatric (aged <16 years) patients, with blood samples at least 3 days before the observed BSI episode. We applied the least absolute shrinkage and selection operator (LASSO) machine-learning algorithm to select a panel of biomarkers predictive of BSI outcome. Results: We developed a panel of 10 probe sets corresponding to six annotated genes ( ARG2 [ arginase 2 ], CPT1A [ carnitine palmitoyltransferase 1A ], FYB [ FYN binding protein ], ITCH [ itchy E3 ubiquitin protein ligase ], MACF1 [ microtubule actin crosslinking factor 1 ], and SSH2 [ slingshot protein phosphatase 2 ]), two uncharacterized ( LOC101928635 , LOC101929599 ), and two unannotated regions. Our multibiomarker panel model yielded highly accurate prediction (area under the receiver operating characteristic curve, 0.938; 95% confidence interval [CI], 0.881-0.981) compared with models with TBSA (0.708; 95% CI, 0.588-0.824) or TBSA and inhalation injury status (0.792; 95% CI, 0.676-0.892). A model combining the multibiomarker panel with TBSA and inhalation injury status further improved prediction (0.978; 95% CI, 0.941-1.000). Conclusions: The multibiomarker panel model yielded a highly accurate prediction of BSIs before their onset. Knowing patients' risk profile early will guide clinicians to take rapid preventive measures for limiting infections, promote antibiotic stewardship that may aid in alleviating the current antibiotic resistance crisis, shorten hospital length of stay and burden on health care resources, reduce health care costs, and significantly improve patients' outcomes. In addition, the biomarkers' identity and molecular functions may contribute to developing novel preventive interventions.

  • Identification of significant gene expression changes in multiple perturbation experiments using knockoffs

    Briefings in Bioinformatics · 2023-02-18 · 6 citations

    articleOpen accessSenior authorCorresponding

    Large-scale multiple perturbation experiments have the potential to reveal a more detailed understanding of the molecular pathways that respond to genetic and environmental changes. A key question in these studies is which gene expression changes are important for the response to the perturbation. This problem is challenging because (i) the functional form of the nonlinear relationship between gene expression and the perturbation is unknown and (ii) identification of the most important genes is a high-dimensional variable selection problem. To deal with these challenges, we present here a method based on the model-X knockoffs framework and Deep Neural Networks to identify significant gene expression changes in multiple perturbation experiments. This approach makes no assumptions on the functional form of the dependence between the responses and the perturbations and it enjoys finite sample false discovery rate control for the selected set of important gene expression responses. We apply this approach to the Library of Integrated Network-Based Cellular Signature data sets which is a National Institutes of Health Common Fund program that catalogs how human cells globally respond to chemical, genetic and disease perturbations. We identified important genes whose expression is directly modulated in response to perturbation with anthracycline, vorinostat, trichostatin-a, geldanamycin and sirolimus. We compare the set of important genes that respond to these small molecules to identify co-responsive pathways. Identification of which genes respond to specific perturbation stressors can provide better understanding of the underlying mechanisms of disease and advance the identification of new drug targets.

  • Model-based identification of conditionally-essential genes from transposon-insertion sequencing data

    PLoS Computational Biology · 2022-03-07 · 5 citations

    articleOpen accessSenior authorCorresponding

    The understanding of bacterial gene function has been greatly enhanced by recent advancements in the deep sequencing of microbial genomes. Transposon insertion sequencing methods combines next-generation sequencing techniques with transposon mutagenesis for the exploration of the essentiality of genes under different environmental conditions. We propose a model-based method that uses regularized negative binomial regression to estimate the change in transposon insertions attributable to gene-environment changes in this genetic interaction study without transformations or uniform normalization. An empirical Bayes model for estimating the local false discovery rate combines unique and total count information to test for genes that show a statistically significant change in transposon counts. When applied to RB-TnSeq (randomized barcode transposon sequencing) and Tn-seq (transposon sequencing) libraries made in strains of Caulobacter crescentus using both total and unique count data the model was able to identify a set of conditionally beneficial or conditionally detrimental genes for each target condition that shed light on their functions and roles during various stress conditions.

  • A Bayesian nonparametric model for inferring subclonal populations from structured DNA sequencing data

    The Annals of Applied Statistics · 2021-06-01 · 1 citations

    articleOpen accessSenior author

    There are distinguishing features or "hallmarks" of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment. We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.

  • Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering

    International Conference on Artificial Intelligence and Statistics · 2021-03-18

    article

Recent grants

Frequent coauthors

Labs

Education

  • Postdoc, Biochemistry

    Stanford University

    2012
  • PhD, Electrical Engineering and Computer Science

    University of California Berkeley

    2006
  • BS, Electrical Engineering

    Rochester Institute of Technology

    2000
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Patrick Flaherty

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup