Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…

Harmen Bussemaker

· Professor of Biological Sciences and of Systems BiologyVerified

Columbia University · Molecular, Cellular, and Developmental Biology

Active 1992–2025

h-index55
Citations12.5k
Papers15125 last 5y
Funding$12.6M
See your match with Harmen Bussemaker — sign in to PhdFit.Sign in

Research topics

  • Biology
  • Computational biology
  • Biochemistry
  • Computer Science
  • Chemistry
  • Biological system

Selected publications

  • Accurate affinity models for <scp>SH2</scp> domains from peptide binding assays and free‐energy regression

    Protein Science · 2025-10-14 · 1 citations

    articleOpen accessSenior authorCorresponding

    Short linear peptide motifs play important roles in phosphotyrosine-dependent signaling networks. They can act both as substrates of kinases and phosphatases and as ligands of peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. In recent years, protein display technologies and next-generation sequencing (NGS) have allowed researchers to profile SH2 domain binding across large libraries of candidate ligands. Here, we present a concerted experimental and computational strategy that updates such specificity profiling from classification to quantification. Multi-round affinity selection on random phosphopeptide libraries yields NGS data suitable for training an additive model that accurately predicts binding free energy across the full theoretical ligand sequence space. For SH2 domains that have been profiled in this manner, the sequence-to-affinity model can be used to predict novel phosphosite targets or the impact of phosphosite variants on binding.

  • Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning

    Nucleic Acids Research · 2025-08-08 · 1 citations

    articleOpen accessSenior author

    Sequence-specific interactions of transcription factors (TFs) with genomic DNA underlie many cellular processes. High-throughput in vitro binding assays coupled with machine learning have made it possible to accurately define such molecular recognition in a biophysically interpretable way for hundreds of TFs across many structural families, providing new avenues for predicting how the sequence preference of a TF is impacted by disease-associated mutations in its DNA binding domain. We developed a method based on a reference-free tetrahedral representation of variation in base preference within a given structural family that can be used to accurately predict the effect of mutations in the protein sequence of the TF. Using the basic helix-loop-helix (bHLH) and homeodomain (HD) families as test cases, our results demonstrate the feasibility of accurately predicting the shifts (ΔΔΔG/RT) in binding free energy associated with TF mutants by leveraging high-quality DNA binding models for sets of homologous wild-type TFs.

  • Accurate sequence-to-affinity models for SH2 domains from multi-round peptide binding assays coupled with free-energy regression

    bioRxiv (Cold Spring Harbor Laboratory) · 2024-12-23

    preprintOpen accessSenior authorCorresponding

    Short linear peptide motifs play important roles in cell signaling. They can act as modification sites for enzymes and as recognition sites for peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. Quantifying this sequence specificity is critical for deciphering phosphotyrosine-dependent signaling networks. In recent years, protein display technologies and deep sequencing have allowed researchers to profile SH2 domain binding across thousands of candidate ligands. Here, we present a concerted experimental and computational strategy that improves the predictive power of SH2 specificity profiling. Through multi-round affinity selection and deep sequencing with large randomized phosphopeptide libraries, we produce suitable data to train an additive binding free energy model that covers the full theoretical ligand sequence space. Our models can be used to predict signaling network connectivity and the impact of missense variants in phosphoproteins on SH2 binding.

  • Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data

    Genome biology · 2024-10-31 · 2 citations

    articleOpen accessSenior author

    BACKGROUND: Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. RESULTS: We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. CONCLUSION: Our work provides new strategies for predicting the functional impact of non-coding variants.

  • Identifying genetic regulatory variants that affect transcription factor activity

    Cell Genomics · 2023-08-18 · 7 citations

    articleOpen accessSenior authorCorresponding

    -acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.

  • Systematic dissection of the regulatory logic of transcriptional activation by TP53 binding sites

    Zenodo (CERN European Organization for Nuclear Research) · 2023-02-21

    articleOpen access

    This deposition contains supplementary data files and laboratory notebooks of the manuscript 'Systematic dissection of the regulatory logic of transcriptional activation by TP53 binding sites'. In this study, thousands of barcoded TP53 reporters were systematically designed and probed in MCF7 and TP53-KO MCF7 cells upon stimulation with Nutlin-3a. Barcodes in the cDNA were quantified by high-throughput sequencing. Among others, the results demonstrate that TP53 drives transcription cooperatively from adjacent TP53 binding sites, and that the positioning of the TP53 binding sites is crucial for transcriptional activation. The files deposited here include: Laboratory notebooks (pdfs printed from labguru). This includes the generation of the plasmid pools, the library transfection protocols, and the western blotting procedure. Raw gel images etc. are available in this folder. Clustered barcode count files. These are the raw barcode counts that were extracted from the sequencing files and clustered using starcode. A reporter information sheet. This file includes a table containing all relevant reporter information and reporter sequences. A reporter activity sheet. This file includes the calculated reporter activities and can be used to recreate the figures presented in the manuscript. A supplementary table containing the primer sequences used in this study.

  • Benchmarking DNA binding affinity models using allele-specific transcription factor binding data

    bioRxiv (Cold Spring Harbor Laboratory) · 2023-12-15

    preprintOpen accessSenior authorCorresponding

    ABSTRACT Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity can manifest itself in vivo at heterozygous loci as a difference in TF occupancy between the two alleles. When applied on a genomic scale, functional genomic assays such as ChIP-seq typically lack the statistical power to detect allele-specific binding (ASB) at the level of individual variants. To address this, we propose a framework for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We show that a likelihood function based on an over-dispersed binomial distribution can aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. We introduce PyProBound, an easily extensible reimplementation of the ProBound biophysically interpretable machine learning framework. Configuring PyProBound to explicitly account for a confounding sequence-specific bias in DNA fragmentation rate yields improved TF binding models when training on ChIP-seq data. We also show how our likelihood function can be leveraged to perform de novo motif discovery on the raw allele-aware ChIP-seq counts.

  • Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements

    Nucleic Acids Research · 2023-08-31 · 10 citations

    articleOpen access

    TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.

  • Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR

    Nucleic Acids Research · 2023-04-04 · 3 citations

    articleOpen accessSenior authorCorresponding

    Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.

  • Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements

    bioRxiv (Cold Spring Harbor Laboratory) · 2023-07-27

    preprintOpen access

    ABSTRACT TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair, and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.

Recent grants

Frequent coauthors

  • Bas van Steensel

    The Netherlands Cancer Institute

    57 shared
  • Chaitanya Rastogi

    Columbia University

    43 shared
  • Judith F. Kribelbauer

    SIB Swiss Institute of Bioinformatics

    32 shared
  • H. Tomas Rube

    Columbia University

    31 shared
  • Richard S. Mann

    Columbia University

    30 shared
  • Marinus F. van Batenburg

    Columbia University

    24 shared
  • Jeffrey J. Delrow

    University of Cambridge

    20 shared
  • Vincent FitzPatrick

    Columbia University

    18 shared

Education

  • PhD, Theoretical Physics

    Utrecht University

    1995
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Harmen Bussemaker

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup