Ira Hall
· Professor of Genetics, Director of the Yale Center for Genomic HealthVerifiedYale University · Medical Genetics
Active 2002–2026
About
Ira Hall, PhD, is a Professor of Genetics and the Director of the Yale Center for Genomic Health. His professional role at Yale School of Medicine involves leadership in genetics research and genomic health initiatives. The information provided identifies his academic title and directorship but does not include further details about his research focus, background, or specific contributions.
Research topics
- Genetics
- Biology
- Computational biology
- Evolutionary biology
- Machine Learning
- Computer Science
- Botany
- Demography
- Mathematics
Selected publications
SNP calling, haplotype phasing and allele-specific analysis with long RNA-seq reads
Nature Methods · 2026-03-30
articleLongcallD: joint calling and phasing of small, structural and mosaic variants from long reads
bioRxiv (Cold Spring Harbor Laboratory) · 2026-03-22
articleOpen accessLong-read sequencing is a powerful technique capturing multiple variants within single continuous reads. This length allows individual reads to bridge small and structural variants while carrying crucial phasing information. However, current computational tools treat small variant calling, structural variant (SV) detection and phasing as largely disconnected problems, failing to unleash the full potential of long reads. Here, we present longcallD, a unified framework utilizing local multiple-sequence alignment to simultaneously call and phase small and structural variants. By integrating germline phasing and retrotransposition hallmarks, longcallD also identifies low-fraction mosaic variants and detects mobile element insertions supported by a single read. Compared to existing methods, our unified approach substantially improves SV discovery and mosaic variants accuracy while maintaining competitive small variant calling. We anticipate that longcallD will provide a robust foundation for resolving complex genetic architectures in clinical and evolutionary applications.
Nature Communications · 2026-04-18
articleOpen accessAbstract Chromosome 22q11.2 microdeletion syndrome (22q11.2DS) is mediated by high-identity polymorphic low-copy repeats (LCRA-to-D) that have been challenging to sequence characterize. We sequence-resolved 135 chromosome 22q11.2 haplotypes from diverse humans and define 63 distinct structural configurations differing in size by 11-fold for LCRA. This diversity is driven by a 105 kbp segmental duplication flanked by 25 kbp inverted repeats that arose in the apes but expanded in humans ~1 million years ago. African LCRA haplotypes are significantly longer ( p = 0.0047) and predicted to be more protective against 22q11.2DS ( p = 1.14×10 -6 ) due to enrichment of inverted 105 kbp repeats. We identify nine distinct (including five recurrent) inversions spanning LCRA-D. Sequencing four families indicates LCRA-D deletions map to 105 kbp repeats, whereas inversions map to the 25 kbp repeats. Here, we show specific haplotype LCR architectures and recurrent large-scale inversions modulate susceptibility to 22q11.2DS and help explain its reduced prevalence among individuals of African ancestry.
Comparison of variant callers using 60 532 multi-ancestry whole genome sequences
Briefings in Bioinformatics · 2026-03-01
articleOpen accessWhole genome sequencing (WGS) studies play a pivotal role in studying the genetic underpinnings of human diseases and traits. High quality and reproducible variant calling is the cornerstone for the success of downstream analyses, including WGS association studies and polygenic risk prediction. This paper compares the data quality, performance, and concordance of two widely used WGS variant callers, the Genome Analysis Toolkit (GATK) and Variant Tool set that discovers short variants (VT), using 60 532 multi-ancestry whole genomes sequenced by the Centers for Common Disease Genomics (CCDGs) of the NHGRI Genome Sequencing Program. Our findings show that both QCed GATK and VT pipelines yield highly consistent and reliable called Single Nucleotide Variants (SNVs) in large-scale WGS studies, supporting their agreements in joint variants calling. However, the two pipelines exhibit greater discrepancies in calling insertions and deletions (INDELs).
Abstract Or139: Single cell variant to enhancer to gene map for coronary artery disease
Arteriosclerosis Thrombosis and Vascular Biology · 2025-04-01
articleAlthough genome wide association studies (GWAS) in large populations have identified hundreds of variants associated with common diseases such as coronary artery disease (CAD). Most variants lie within non-coding regions of the genome, rendering it difficult to determine the downstream causal gene and cell type. Here, we performed paired single nucleus gene expression and chromatin accessibility profiling from 44 human coronary arteries. To link disease variants to molecular traits, we developed a meta-map of 88 samples and discovered 11,182 single-cell chromatin accessibility quantitative trait loci (caQTLs). Heritability enrichment analysis and disease variant mapping demonstrated that smooth muscle cells (SMCs) harbor the greatest genetic risk for CAD. To capture the continuum of SMC cell states in disease, we used single cell caQTL modeling for the first time in tissue to uncover QTLs whose effects are modified by cell state and expand our insight into genetic regulation in heterogenous cell populations. We identified a variant in the COL4A1 / COL4A2 CAD GWAS locus which becomes a caQTL as SMCs de-differentiate by changing a transcription factor binding site for EGR1/2. To unbiasedly prioritize functional candidate genes, we built a genome-wide single cell variant to enhancer to gene (scV2E2G) map for human CAD to link disease variants to causal genes in cell types. Using this approach, we found several hundred genes predicted to be linked to disease variants in different cell types. Next, we performed genome-wide Hi-C in 16 human coronary arteries to build tissue specific maps of chromatin conformation and link disease variants to integrated chromatin hubs and distal target genes. Using this approach, we show that rs4887091 within the ADAMTS7 GWAS locus modulates function of a super chromatin interactome through a change in a CTCF binding site. Finally we used CRISPR interference to validate target candidate genes. Collectively we provide a disease-agnostic framework to translate human genetic findings to identify pathologic cell states and genes driving disease, producing a comprehensive scV2E2G map with genetic and tissue level convergence for future mechanistic and therapeutic studies.
Clade distillation for genome-wide association studies
Genetics · 2025-08-07 · 3 citations
articleOpen accessSenior authorTesting inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity-when multiple causal variants modulate a phenotype-in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.
Structural and transduction patterns of human-specific polymorphic SVA insertions
Mobile DNA · 2025-11-06 · 1 citations
articleOpen accessBACKGROUND: SINE variable number tandem repeat Alu elements (SVAs) are a unique group of hominid-specific composite retrotransposons with highly variable internal structure. They represent the youngest TE family in humans and contribute to genetic diversity, evolution, and disease. Recent findings indicate that SVA mobilization rates may exceed previous estimates, and many SVAs exhibit insertion polymorphism. SVAs facilitate transduction (TD) events when transcription initiates upstream of a source element, or when their internal termination signal is bypassed, mobilizing adjacent 5' and/or 3' sequence. To investigate features of non-reference SVA elements currently polymorphic in the human genome, we analyzed a structural variant callset built upon 35 diverse human genomes generated by the Human Genome Structural Variation Consortium. RESULTS: is a major contributor to SVA expansion in the human population. We further uncover that 40% of non-reference SVAs carry a TD on their 5' and/or 3' ends. Of these, the majority (69%) harbor sequence originating in a gene, including 14 exonic events and the mobilization of a processed pseudogene, supporting the role of SVA in exon shuffling. In addition, we identified a so-called "orphan" TD, defined by the absence of SVA sequence at the insertion site. Leveraging TD origin coordinates, we identify 55 active source elements, including nine non-reference and 46 across GRCh38 and T2T-CHM13, giving rise to 84% of TD-carrying SVAs. CONCLUSIONS: is more active than previously described and is a main driver of SVA expansion. We find two-fold more TD events compared to previous estimates, with an unexpected bias toward 3' events. Finally, we postulate that the discrepant SVA mobilization rate may be attributed to inter-individual variation in the presence/absence of source elements, a recent uptick in mobilization supported by overall low allele frequencies, and/or negative selection against deleterious insertions.
Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci
UNC Libraries · 2025-11-08
articleOpen accessGenetics in Medicine Open · 2025-01-01 · 1 citations
articleOpen accessSex-specific genetic effects on susceptibility to idiopathic pulmonary fibrosis
ERJ Open Research · 2025-05-08
articleOpen accessBackground: Idiopathic pulmonary fibrosis (IPF) is a chronic lung condition that is more prevalent in males than females. The reasons for this are not fully understood; differing environmental exposures due to historically sex-biased occupations and diagnostic bias are possible explanations. To date, over 20 independent genetic association signals have been reported for IPF susceptibility, but these have been discovered when combining males and females. The objectives of the present study were to assess whether there is a need to consider sex-specific effects when evaluating genetic risk in clinical prediction models for IPF and to test for sex-specific associations with IPF susceptibility. Methods: We performed a genome-wide single nucleotide polymorphism (SNP)-by-sex interaction study meta-analysis of IPF risk in six independent case-control studies comprising 4561 cases (1280 females, 3281 males) and 22 888 controls (8360 females, 14 528 males) of European genetic ancestry. We used polygenic risk scores (PRSs) comprising common (minor allele frequency >1%) autosomal variants to assess differences in genetic risk prediction between males and females. Results: ). Conclusions: The predictive accuracy of common autosomal SNP-based PRSs did not vary significantly between males and females. We prioritised three genetic variants whose effect on IPF risk may be modified by sex. These findings would not account for the differences in prevalence between males and females. Future studies should ensure adequate representation of both sexes.
Recent grants
A Platform for Large-Scale Discovery in Common Disease
NIH · $76.2M · 2016–2022
Center for Human Reference Genome Diversity
NIH · $18.4M · 2019–2024
Genome-wide investigation of somatic mutation in the developing and aging brain
NIH · $3.4M · 2014–2021
NIH · $2.3M · 2013
Frequent coauthors
- 82 shared
Haley Abel
Washington University in St. Louis
- 75 shared
Aarno Palotie
Institute for Molecular Medicine Finland
- 62 shared
Nathan O. Stitziel
Washington University in St. Louis
- 54 shared
Allison Regier
- 48 shared
David E. Larson
Washington University in St. Louis
- 48 shared
Colby Chiang
Boston Children's Museum
- 43 shared
Samuli Ripatti
University of Helsinki
- 43 shared
Aki S. Havulinna
Institute for Molecular Medicine Finland
Labs
Education
B.A.
Integrative Biology
Awards & honors
- AAAS Newcomb Cleveland Prize (2003)
- Burroughs Wellcome Fund Career Award (2006)
- NIH Director's New Innovator Award (2009)
- March of Dimes Basil O'Connor Research Award (2010)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Ira Hall
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup