Alexis Battle
· Wu and Zhang Professor of Biomedical Engineering and Computer Science, with a secondary appointment in Genetic MedicineVerifiedJohns Hopkins University · Biochemistry and Molecular Biology
Active 2002–2026
About
Alexis Battle is the Wu and Zhang Professor of biomedical engineering and computer science at Johns Hopkins University, with a secondary appointment in genetic medicine. She is the director of the Malone Center for Engineering in Healthcare and specializes in unlocking secrets of the human genome by analyzing large-scale genomic sequencing data to understand the impact of genetic variation on the human body. Her research focuses on developing computational biology tools and machine-learning strategies to examine genetic differences on gene regulation and disease. Battle is a leading member of the NIH’s Genotype-Tissue Expression (GTEx) Consortium, where she works on predicting the effects of variation in noncoding DNA sequences. Her work on a GTEx project, which studied how genetic patterns lead to molecular changes within specific tissues, was published in Nature in 2017. Her research also includes developing methods to evaluate and predict the impact of personal genomics and rare genetic variants that may significantly influence health. She is involved in ongoing initiatives such as building integrative networks for genomic analysis of autism and predicting rare Mendelian disease variants, supported by her Searle award and JHU Catalyst Award. Additionally, she received a 2019 Johns Hopkins Discovery Award for her work on the genetics of atherosclerotic cardiovascular disease. Battle earned her BS, MS, and PhD in computer science from Stanford University, completing her PhD in 2013. She worked as a postdoctoral researcher at Stanford before joining Johns Hopkins in 2014. Prior to her academic career, she was a staff software engineer and manager at Google.
Research signals
Five dimensions sourced from public faculty / publication signals. Sign in to compare against your own profile and see your match score.
Research topics
- Biology
- Genetics
- Computational biology
- Demography
- Medicine
- Evolutionary biology
- Internal medicine
- Pathology
- Cell biology
Selected publications
Beyond the baseline: mapping the context-specific regulatory landscape of disease
Trends in Genetics · 2026-03-12
articleOpen accessSenior authorGenome-wide association studies have identified thousands of intergenic variants associated with disease, most of which are presumed to act by affecting gene regulation. Standard expression quantitative trait locus (eQTL) studies were able to link many disease-associated loci to changes in gene expression. Yet, many disease-associated loci show no detectable regulatory effects in baseline bulk gene expression datasets from adult tissues. Recent work shows that, overall, standard eQTLs differ systematically from disease-associated loci, pointing to regulatory effects not captured under baseline conditions. We review emerging evidence that context-specific eQTLs, revealed under environmental perturbations, stress, or developmental transitions, resemble disease loci more closely. We highlight new in vitro systems and machine learning approaches that promise systematic identification of these context-dependent effects.
Transcriptomic signatures of rare variant impacts across sex and the X chromosome
Human Genetics and Genomics Advances · 2025-05-31 · 2 citations
articleOpen accessThe human X chromosome contains hundreds of genes and has well-established impacts on sex differences and traits. However, the X chromosome is often excluded from many genetic analyses, limiting broader understanding of variant effects. In particular, the functional impact of rare variants on the X chromosome is understudied. To investigate functional rare variants on the X chromosome, we use observations of outlier gene expression from Genotype Tissue Expression consortium data. We show that outlier genes are enriched for having nearby rare variants on the X chromosome, and this enrichment is stronger for males. Using the RIVER model, we identified 733 rare variants in 450 genes predicted to have functional differences between males and females. We examined the pharmacogenetic implications of these variants and observed that 25% of drugs with a known sex difference in adverse drug reactions were connected to genes that contained a sex-biased rare variant. We further identify that sex-biased rare variants preferentially impact transcription factors with predicted sex-differential binding, such as the XIST-modulated SIX1. Overall, we observed more within-sex variation than between-sex variation. Combined, our study investigates functional rare variants on the X chromosome, and further details how sex stratification of variant effect prediction improves identification of rare variants with predicted sex-biased effects, transcription factor biology, and pharmacogenomic impacts.
Epidemiology and Natural History of Preclinical and Clinical Obesity: Insights from the UK Biobank
SSRN Electronic Journal · 2025-01-01
preprintOpen accessCancer Research · 2025-12-11
articleOpen accessCastration-resistant prostate cancer (CRPC) is largely dependent on the androgen receptor (AR) for growth and often exhibits hyperactive PI3K signaling, most frequently because of PTEN loss. Therapeutic pressure from anti-AR therapies can induce transdifferentiation toward an AR-independent phenotype. Recently, different subtypes of AR-independent CRPC have been redefined, with the stem cell-like (SCL) subtype emerging as one of the most prevalent. Elucidation of the epigenetic mechanisms controlling the maintenance of these distinct CRPC cell states could pave the way for effective combinatorial therapies for CRPC. In this study, we identified a key role for the histone methyltransferase KMT2D in establishing the chromatin competence necessary for the recruitment of AR and FOXA1 transcription factors (TF) that are essential for the AR transcriptional output in AR-dependent CRPC cell lines, patient-derived organoids, and patient samples. Unexpectedly, KMT2D maintained the identity of the AR-low CRPC-SCL subtype and controlled activity of AP-1 TFs such as FOSL1, which acts as a master regulator of this subtype. Single-cell transcriptomics and chromatin assays underscored the role of KMT2D in sustaining a mixed lineage cell state via AP-1 and FOXA1. The combined suppression of PI3K/AKT and KMT2D reduced cell proliferation in prostate cancer cells and patient-derived organoids in both CRPC-AR and CRPC-SCL subtypes. Altogether, these results unveil KMT2D as a major mediator of the epigenetic landscape in subtype-specific CRPC, contributing to tumor growth and therapeutic response. SIGNIFICANCE: KMT2D is a critical regulator of chromatin accessibility and transcriptional landscapes in castration-resistant prostate cancer that drives both AR-dependent and AR-independent subtypes, highlighting KMT2D as a potential therapeutic target.
Genetic determinants and genomic consequences of non-leukemogenic somatic point mutations
UNC Libraries · 2025-12-17
articleOpen accessClonal hematopoiesis (CH) is defined by the expansion of a lineage of genetically identical cells in blood. Genetic lesions that confer a fitness advantage, such as leukemogenic point mutations or mosaic chromosomal alterations (mCAs), are frequent mediators of CH. However, recent analyses of both single cell-derived colonies of hematopoietic cells and population sequencing cohorts have revealed CH frequently occurs in the absence of known driver genetic lesions. To characterize CH without known driver genetic lesions, we use 51,399 deeply sequenced whole genomes from the NHLBI TOPMed sequencing initiative to perform simultaneous germline and somatic mutation analyses among individuals without leukemogenic point mutations (LPM), which we term CH-LPMneg. We quantify CH by estimating the total mutation burden. Because estimating somatic mutation burden without a paired-tissue sample is challenging, we develop a novel statistical method, the Genomic and Epigenomic informed Mutation (GEM) rate, that uses external genomic and epigenomic data sources to distinguish artifactual signals from true somatic mutations. We perform a genome-wide association study of GEM to discover the germline determinants of CH-LPMneg. We identify seven genes associated with CH-LPMneg (TCL1A, TERT, SMC4, NRIP1, PRDM16, MSRA, SCARB1).Functional analyses of SMC4 and NRIP1 implicated altered hematopoietic stem cell self-renewal and proliferation as the primary mediator of mutation burden in blood. We then perform comprehensive multi-tissue transcriptomic analyses, finding that the expression levels of 404 genes are associated with GEM. Finally, we perform phenotypic association meta-analyses across four cohorts, finding that GEM is associated with increased white blood cell count, but is not significantly associated with incident stroke or coronary disease events. Overall, we develop GEM for quantifying mutation burden from WGS and use GEM to discover the genetic, genomic, and phenotypic correlates of CH-LPMneg.
Genome Research · 2025-07-17
articleOpen accessSenior authorGene coexpression networks (GCNs) describe relationships among genes that maintain cellular identity and homeostasis. However, typical RNA-seq experiments often lack sufficient sample sizes for reliable GCN inference. recount3, a data set with 316,443 processed human RNA-seq samples, provides an opportunity to improve network reconstruction. However, GCN inference from public data is challenged by confounders and inconsistent labeling. To address this, we develop a pipeline to annotate samples based on cell-type composition. By comparing aggregation strategies, we find that regressing confounders within studies and prioritizing larger studies optimizes network reconstruction. We apply these findings to infer three consensus networks (universal, cancer, noncancer) and 27 context-specific networks. Central genes in consensus networks are enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas context-specific central nodes include tissue-specific transcription factors. The increased statistical power from data aggregation facilitates the derivation of variant annotations from context-specific networks, which are significantly enriched for complex-trait heritability independent of overlap with baseline functional genomic annotations. Although data aggregation led to strictly increasing held-out log-likelihood, we observe diminishing marginal improvements, suggesting that integrating complementary modalities, such as Hi-C and ChIP-seq, can further refine network reconstruction. Our approach outlines best practices for GCN inference and highlights both the strengths and limitations of data aggregation.
Leveraging and partitioning polygenic risk scores to identify cancer-related proteins
medRxiv · 2025-11-25
preprintOpen accessBackground: Large-scale genome-wide association studies (GWAS) have identified numerous common susceptibility variants associated with various cancers but underlying molecular mechanisms remain largely unknown. Methods: Here we investigated the associations of susceptibility SNPs from 21 cancers with 4,955 plasma protein levels measured in cancer-free participants (N=8,664) from the Atherosclerosis Risk in Communities (ARIC) study. We used two complementary approaches, one based on analysis of associations of polygenic risk scores with the plasma proteome (pQTS) and the other based on a sparse canonical correlation analysis of the cancer-associated SNPs with the plasma proteome (ARCHIE), to detect potential mediating proteins and sub-networks. Results: )-associations between cancer related SNPs and proteins. ARCHIE identified 19 significantly associated protein networks encompassing a broader set of 433 proteins often including the proteins identified by pQTS. We found that the proteins identified by pQTS and/or ARCHIE were enriched for relevant biological processes and cell types as well as cancer drivers and have somatic evidence of being associated with the respective cancers. For example, using SNPs associated with risk of basal cell carcinoma, we identified two protein sets having distinct functions: one primarily enriched in immune and inflammatory responses while the other enriched in pigmentation. Additionally, we identified proteins associated with multiple related cancers indicating potential pleiotropic protein activity. Conclusion: Our analysis leverages known GWAS associations for cancers to identify protein networks underlying cancer risk and accordingly partition polygenic risk scores into mechanistic components. As detailed molecular data of relevant tissues, cell-types and developmental stage become increasingly available, similar approaches will prove to be important for identifying downstream molecular targets for GWAS variants and improve interpretation and research application of polygenic risk scores.
The American Journal of Human Genetics · 2025-07-28
articleOpen accessSenior authorComplex trait-associated genetic variation is highly pleiotropic. This extensive pleiotropy implies that multi-phenotype analyses are informative for characterizing genetic associations, as they facilitate the discovery of trait-shared and trait-specific variants and pathways ("genetic factors"). Previous efforts have estimated genetic factors using matrix factorization (MF) applied to numerous genome-wide association studies (GWASs). However, existing methods are susceptible to spurious factors arising from residual confounding due to sample sharing in biobank GWASs. Furthermore, MF approaches have historically estimated dense factors, loaded on most traits and variants, that are challenging to map onto interpretable biological pathways. To address these shortcomings, we introduce "GWAS latent embeddings accounting for noise and regularization" (GLEANR), an MF method for detection of sparse genetic factors from summary statistics. GLEANR accounts for sample sharing between studies and uses regularization to estimate a data-driven number of interpretable factors. GLEANR is robust to confounding induced by shared samples and improves the replication of genetic factors derived from distinct biobanks. We used GLEANR to evaluate 137 diverse GWASs from the UK Biobank, identifying 58 factors that decompose the genetic architecture of input traits and have distinct signatures of negative selection and degrees of polygenicity. These sparse factors can be interpreted with respect to disease, cell type, and pathway enrichment. We highlight three such factors that captured platelet-measure phenotypes and were enriched for disease-relevant markers corresponding to distinct stages of platelet differentiation. Overall, GLEANR is a powerful tool for discovering both trait-specific and trait-shared pathways underlying complex traits from GWAS summary statistics.
Genome Research · 2025-03-20 · 9 citations
articleOpen accessSenior authorRare structural variants (SVs)—insertions, deletions, and complex rearrangements—can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore Technologies long-read genomes of 68 individuals from the undiagnosed disease network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4× increase from short reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably, these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression toward improving the prioritization of functional SVs and TREs in rare disease patients.
Disease-associated loci share properties with response eQTLs under common environmental exposures
bioRxiv (Cold Spring Harbor Laboratory) · 2025-05-04
preprintOpen accessAbstract Many of the genetic loci associated with disease are expected to have context-dependent regulatory effects that are underrepresented in the transcriptomes of healthy, steady-state adult tissues. To understand gene regulation across diverse environmental conditions and cellular contexts, we treated a broad array of human cell types with three environmental exposures in vitro . With single-cell RNA-sequencing data from 1.4 million cells across 51 individuals, we identified hundreds of response expression quantitative loci (eQTLs) that are associated with inter-individual differences in regulatory changes following treatment with nicotine, caffeine, or ethanol in diverse cell types. We also identified dynamic regulatory effects that vary across differentiation trajectories in response to exposure. In contrast to steady-state eQTLs, and similar to disease risk loci, response eQTLs are enriched in distal enhancers and are regulating genes that experienced strong selective constraint, contain complex regulatory landscapes, and display diverse biological functions. We identified response eQTLs that coincide with disease-associated loci not explained by steady-state eQTLs. Our results highlight the complexity of genetic regulatory effects and suggest that our ability to interpret disease-associated loci will benefit from the pursuit of studies of gene-by-environment interactions in diverse biological contexts.
Recent grants
Clinical Translation and Validation Core
NIH · $42.8M · 2021–2027
3/3 Building integrative CNS networks for genomic analysis of autism
NIH · $1.2M · 2016–2021
Modeling the dynamicimpact of rare and common genetic variation on gene expression anddisease
NIH · $3.1M · 2021–2025
Methods for analysis of regulatory variation in cellular differentiation
NIH · $1.9M · 2016–2021
Frequent coauthors
- 60 shared
Tuuli Lappalainen
Science for Life Laboratory
- 59 shared
Stephen B. Montgomery
Stanford University
- 54 shared
Brandon L. Pierce
Chicago Department of Public Health
- 53 shared
Marios Arvanitis
Johns Hopkins University
- 51 shared
Benjamin J. Strober
Harvard University
- 46 shared
Silva Kasela
- 44 shared
Ashis Saha
- 44 shared
Stephane E. Castel
Awards & honors
- 2016 Searle Scholar
- 2017 JHU Catalyst Award
- 2019 Johns Hopkins Discovery Award
- TIME100 AI 2025 (recognized as part of Cancer AI Alliance)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Alexis Battle
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup