Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Mark Gerstein

Mark Gerstein

· Albert L. Williams Professor of Molecular Biophysics and Biochemistry; Professor of Statistics and Data ScienceVerified

Yale University · Department of Statistics and Data Science

Active 1991–2026

h-index217
Citations303.3k
Papers1.2k340 last 5y
Funding$218.6M4 active
See your match with Mark Gerstein — sign in to PhdFit.Sign in

About

Mark Gerstein is the Albert L Williams Professor and Principal Investigator of the Gerstein Bioinformatics Group at Yale University. His research focuses on bioinformatics, with particular emphasis on the development of computational methods to analyze biological data. His work involves understanding complex biological systems through data-driven approaches, contributing significantly to the fields of genomics, systems biology, and computational biology. As a leading figure in bioinformatics, Gerstein's contributions include advancing the understanding of molecular biology through computational techniques, and mentoring a diverse group of researchers, including postdoctoral associates, graduate students, and undergraduates. His role at Yale involves both research and leadership in the bioinformatics community, fostering innovations that bridge biology and computer science.

Research topics

  • Biology
  • Genetics
  • Computational biology
  • Computer Science
  • Evolutionary biology
  • Neuroscience
  • Machine Learning
  • Medicine
  • Biological system
  • Endocrinology
  • Biochemistry
  • Biophysics
  • Immunology
  • Mathematics
  • Pathology
  • Psychology
  • Materials science
  • Statistics
  • Chemistry
  • Cell biology
  • Psychiatry
  • Virology
  • Cancer research

Selected publications

  • Transcriptomic and phenotypic convergence of neurodevelopmental disorder risk genes in vitro and in vivo

    Nature Neuroscience · 2026-04-24

    articleOpen access

    Diverse risk genes have been identified for neurodevelopmental disorders (NDDs), but how these genes converge on similar biological pathways in neurons, and thus give rise to similar phenotypes, is unclear. Here we apply a pooled CRISPR approach to successfully target 23 NDD loss-of-function genes with roles in chromatin biology and examine convergent effects on gene expression across human induced pluripotent stem cell-derived neural progenitor cells, glutamatergic neurons and GABAergic neurons. Points of convergence vary between these cell types, with the greatest number of convergent genes and strongest convergent networks in mature glutamatergic neurons, where they broadly represent synaptic, epigenetic and, unexpectedly, mitochondrial pathways. The most convergent networks were observed between NDD genes with shared biological annotations, clinical associations and co-expression patterns in human post-mortem brain. Drugs that were predicted to reverse convergent transcriptomic signatures and/or arousal and sensory processing behaviors ameliorated behavioral phenotypes in zebrafish NDD gene mutants. These results suggest that convergent effects of NDD risk genes could provide clinically useful insights.

  • Interpretability and implicit model semantics in biomedicine and deep learning

    Nature Machine Intelligence · 2026-03-23

    articleSenior authorCorresponding
  • Dynamic convergence of neurodevelopment disorder risk genes: Seahorse Mito Stress and mitochondrial morphology datasets

    Zenodo (CERN European Organization for Nuclear Research) · 2025-07-16

    datasetOpen access

    These data tables contain results and statistical analyses from the Seahorse Mito Stress assay and TOMM20 immunostaining for mitochondrial morphology. Detailed methods and final figures are available at DOI: https://doi.org/10.1101/2024.08.23.609190

  • The chronODE framework for modelling multi-omic time series with ordinary differential equations and machine learning

    Nature Communications · 2025-08-19 · 2 citations

    articleOpen accessSenior authorCorresponding

    Many genome-wide studies capture isolated moments in cell differentiation or organismal development. Conversely, longitudinal studies provide a more direct way to study these kinetic processes. Here, we present an approach for modeling gene-expression and chromatin kinetics from such studies: chronODE, an interpretable framework based on ordinary differential equations. chronODE incorporates two parameters that capture biophysical constraints governing the initial cooperativity and later saturation in gene expression. These parameters group genes into three major kinetic patterns: accelerators, switchers, and decelerators. Applying chronODE to bulk and single-cell time-series data from mouse brain development reveals that most genes (~87%) follow simple logistic kinetics. Among them, genes with rapid acceleration and high saturation values are rare, highlighting biochemical limitations that prevent cells from attaining both simultaneously. Early- and late-emerging cell types display distinct kinetic patterns, with essential genes ramping up faster. Extending chronODE to chromatin, we find that genes regulated by both enhancer and silencer cis-regulatory elements are enriched in brain-specific functions. Finally, we develop a bidirectional recurrent neural network to predict changes in gene expression from corresponding chromatin changes, successfully capturing the cumulative effect of multiple regulatory elements. Overall, our framework allows investigation of the kinetics of gene regulation in diverse biological systems.

  • FAVOR 2.0: A reengineered functional annotation of variants online resource for interpreting genomic variation

    Nucleic Acids Research · 2025-10-21 · 1 citations

    articleOpen access

    The Functional Annotation of Variants Online Resource (FAVOR), http://favor.genohub.org, is a whole genome variant annotation database and portal that provides comprehensive variant functional annotations of all possible variants across the genome. It can facilitate the analysis of whole-genome sequencing studies, support the interpretation of variant functional impacts, and help prioritize causal variants of diseases or traits. To support the growing popularity and expand the scope of FAVOR, we present here a substantial platform update. The new release features dramatically expanded annotations, a completely redesigned infrastructure powered by a newly implemented application programming interface (FAVOR-API), and a revamped web interface with advanced data-visualization capabilities and enhanced query performance. Key expansions include much more comprehensive variant annotations, including global, tissue- and cell-type-specific variant annotations; gene and protein annotations; support for both hg38 and hg19 reference genomes; and an interactive genome-browser for visualization of multi-faceted variant annotations. The updated platform also includes FAVOR-GPT, a large language model-powered interface for navigating the FAVOR database and interpreting results. FAVOR continues to evolve to keep pace with advances in research on interpreting the functional and phenotypic impact of genomic variation.

  • The IGVF catalog—from genetic variation to function

    Nucleic Acids Research · 2025-12-08 · 3 citations

    articleOpen access

    Genomic variation between individuals is essential for understanding how differences in the genome sequence affect molecular and cellular processes. The Impact of Genomic Variation on Function (IGVF) Consortium aims to uncover the relationships among genomic variation, genome function, and phenotypes by combining experimental techniques, such as single-cell mapping and genomic perturbation assays, with computational approaches such as machine learning-based predictive modeling. The IGVF Data and Administrative Coordinating Centers collect, analyze, and disseminate data and results from across the consortium through an open-source platform called the IGVF Catalog. This resource includes, but is not limited to, data on the effects of coding variants on protein abundance and function, noncoding variants on enhancer activity (measured by MPRA or predicted computationally), and associations between variants and quantitative traits. All data are organized within a graph database comprising over 50 types of data collections with nearly 3 billion nodes and over 7.5 billion edges. The Catalog offers public API endpoints (https://api.catalogkg.igvf.org/) and a user-friendly interface for exploring, querying, and visualizing the data at https://catalog.igvf.org. We expect that this open-access platform will support the broader scientific community to advance our understanding of how genomic variation influences biology and disease.

  • Epigenetic characterization of pseudogenes across human tissues

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-10-06

    preprintOpen accessSenior authorCorresponding

    Pseudogenes have historically been regarded as nonfunctional remnants of genome evolution. However, relative to other noncoding genomic elements, their promoter architecture and epigenetic regulation remain incompletely understood. Here, we systematically characterize pseudogene promoters and compare them with those of protein-coding genes and long noncoding RNAs. To do this, we integrate matched transcriptomic and epigenomic data across 26 human tissues from the EN-TEx (ENCODE-GTEx) project. We uniformly annotate promoters with chromatin features (histone modifications, chromatin accessibility, and DNA methylation), sequence motifs, and evolutionary conservation, generating an online catalog. Leveraging this catalog, we show that, across multiple tissues, transcribed, unprocessed pseudogenes exhibit chromatin patterns similar to those of active protein-coding genes. In contrast, transcribed, processed pseudogenes show a different pattern: most lack the canonical hallmarks of transcription (e.g., active histone marks) at their promoters. Instead, their promoters show increased overlap with LINE elements, enrichment for YY1-like binding motifs, and higher Hi-C contact frequency, particularly with distal enhancer-like regulatory regions. Together with their greater conservation (relative to unprocessed pseudogenes), these features suggest that the transcription of processed pseudogenes may require regulatory mechanisms distinct from canonical promoter-associated epigenetic activation.

  • Aerosol-based exposure to opportunistic pathogens originating from hospital sink drains

    American Journal of Infection Control · 2025-11-04 · 3 citations

    article
  • DNA shape and epigenomics distinguish the mechanistic origin of human genomic structural variations

    Nucleic Acids Research · 2025-11-07 · 1 citations

    articleOpen access

    The recent advent of long-read whole genome sequencing has enabled us to create an accurate telomere-to-telomere reference genome, construct pangenome graphs, and compile precise catalogs of genomic structural variations (SVs). These comprehensive SV repositories provide an excellent opportunity to explore the role of SVs in genotype-phenotype associations and examine the mechanisms by which SVs are introduced through double-strand break (DSB) repair. Here, we employed comprehensive SV catalogs identified through various short- and long-read whole genome sequencing efforts to infer the underlying mechanisms of SV introduction based on their genomic and epigenomic profiles. Our findings indicate that high local DNA methylation and DNA shape-related features, such as low variations in propeller twist, support the origins of homology-driven SVs. Subsequently, we utilized an active-learning-based unsupervised clustering approach, revealing that homology-dependent SVs show greater evidence of retaining ancestral recombination patterns compared to their homology-independent counterparts. Finally, our comparison of inherited and de novo SVs from healthy populations and rare disease cohorts showed distinct upstream H3K27me3 levels in de novo SVs from individuals with ultra-rare disorders. These findings highlight genome-wide characteristics that may influence the choice of repair mechanisms linked to heritable SV origins.

  • Complex genetic variation in nearly complete human genomes

    Nature · 2025-07-23 · 67 citations

    articleOpen access

    Abstract Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (median continuity of 130 Mb), closing 92% of all previous assembly gaps 1,2 and reaching telomere-to-telomere status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1 / SMN2 , NBPF8 and AMY1/AMY2 , and fully resolve 1,852 complex structural variants. In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite higher-order repeat array length and characterize the pattern of mobile element insertions into α-satellite higher-order repeat arrays. Although most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference 1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference 3 to a median quality value of 45. Using this approach, 26,115 structural variants per individual are detected, substantially increasing the number of structural variants now amenable to downstream disease association studies.

Recent grants

Frequent coauthors

  • Joel Rozowsky

    Lieber Institute for Brain Development

    301 shared
  • Mark A. Rubin

    University of Bern

    296 shared
  • M Snyder

    286 shared
  • Andrea Sboner

    Weill Cornell Medicine

    215 shared
  • Rory Johnson

    University Hospital of Bern

    210 shared
  • Jan O. Korbel

    German Cancer Research Center

    207 shared
  • Lars Feuerbach

    German Cancer Research Center

    168 shared
  • Rajiv Dhir

    Shadyside Hospital

    158 shared

Labs

Education

  • PhD, Chemistry

    University of Cambridge

    1993
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Mark Gerstein

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup