Harmen Bussemaker

· Professor of Biological Sciences and of Systems BiologyVerified

Columbia University · Molecular, Cellular, and Developmental Biology

Active 1992–2025

h-index55

Citations12.5k

Papers15125 last 5y

Funding$12.6M

Faculty page

See your match with Harmen Bussemaker — sign in to PhdFit.Sign in

Research topics

Biology
Computational biology
Biochemistry
Computer Science
Chemistry
Biological system

Selected publications

Accurate affinity models for <scp>SH2</scp> domains from peptide binding assays and free‐energy regression
Protein Science · 2025-10-14 · 1 citations
articleOpen accessSenior authorCorresponding
Short linear peptide motifs play important roles in phosphotyrosine-dependent signaling networks. They can act both as substrates of kinases and phosphatases and as ligands of peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. In recent years, protein display technologies and next-generation sequencing (NGS) have allowed researchers to profile SH2 domain binding across large libraries of candidate ligands. Here, we present a concerted experimental and computational strategy that updates such specificity profiling from classification to quantification. Multi-round affinity selection on random phosphopeptide libraries yields NGS data suitable for training an additive model that accurately predicts binding free energy across the full theoretical ligand sequence space. For SH2 domains that have been profiled in this manner, the sequence-to-affinity model can be used to predict novel phosphosite targets or the impact of phosphosite variants on binding.
Publisher OA PDF DOI
Predicting the DNA binding specificity of transcription factor mutants using family-level biophysically interpretable machine learning
Nucleic Acids Research · 2025-08-08 · 1 citations
articleOpen accessSenior author
Sequence-specific interactions of transcription factors (TFs) with genomic DNA underlie many cellular processes. High-throughput in vitro binding assays coupled with machine learning have made it possible to accurately define such molecular recognition in a biophysically interpretable way for hundreds of TFs across many structural families, providing new avenues for predicting how the sequence preference of a TF is impacted by disease-associated mutations in its DNA binding domain. We developed a method based on a reference-free tetrahedral representation of variation in base preference within a given structural family that can be used to accurately predict the effect of mutations in the protein sequence of the TF. Using the basic helix-loop-helix (bHLH) and homeodomain (HD) families as test cases, our results demonstrate the feasibility of accurately predicting the shifts (ΔΔΔG/RT) in binding free energy associated with TF mutants by leveraging high-quality DNA binding models for sets of homologous wild-type TFs.
Publisher OA PDF DOI
Accurate sequence-to-affinity models for SH2 domains from multi-round peptide binding assays coupled with free-energy regression
bioRxiv (Cold Spring Harbor Laboratory) · 2024-12-23
preprintOpen accessSenior authorCorresponding
Short linear peptide motifs play important roles in cell signaling. They can act as modification sites for enzymes and as recognition sites for peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. Quantifying this sequence specificity is critical for deciphering phosphotyrosine-dependent signaling networks. In recent years, protein display technologies and deep sequencing have allowed researchers to profile SH2 domain binding across thousands of candidate ligands. Here, we present a concerted experimental and computational strategy that improves the predictive power of SH2 specificity profiling. Through multi-round affinity selection and deep sequencing with large randomized phosphopeptide libraries, we produce suitable data to train an additive binding free energy model that covers the full theoretical ligand sequence space. Our models can be used to predict signaling network connectivity and the impact of missense variants in phosphoproteins on SH2 binding.
Publisher OA PDF DOI
Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data
Genome biology · 2024-10-31 · 2 citations
articleOpen accessSenior author
BACKGROUND: Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. RESULTS: We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. CONCLUSION: Our work provides new strategies for predicting the functional impact of non-coding variants.
Publisher OA PDF DOI
Identifying genetic regulatory variants that affect transcription factor activity
Cell Genomics · 2023-08-18 · 7 citations
articleOpen accessSenior authorCorresponding
-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.
Publisher DOI
Systematic dissection of the regulatory logic of transcriptional activation by TP53 binding sites
Zenodo (CERN European Organization for Nuclear Research) · 2023-02-21
articleOpen access
This deposition contains supplementary data files and laboratory notebooks of the manuscript 'Systematic dissection of the regulatory logic of transcriptional activation by TP53 binding sites'. In this study, thousands of barcoded TP53 reporters were systematically designed and probed in MCF7 and TP53-KO MCF7 cells upon stimulation with Nutlin-3a. Barcodes in the cDNA were quantified by high-throughput sequencing. Among others, the results demonstrate that TP53 drives transcription cooperatively from adjacent TP53 binding sites, and that the positioning of the TP53 binding sites is crucial for transcriptional activation. The files deposited here include: Laboratory notebooks (pdfs printed from labguru). This includes the generation of the plasmid pools, the library transfection protocols, and the western blotting procedure. Raw gel images etc. are available in this folder. Clustered barcode count files. These are the raw barcode counts that were extracted from the sequencing files and clustered using starcode. A reporter information sheet. This file includes a table containing all relevant reporter information and reporter sequences. A reporter activity sheet. This file includes the calculated reporter activities and can be used to recreate the figures presented in the manuscript. A supplementary table containing the primer sequences used in this study.
Publisher DOI
Benchmarking DNA binding affinity models using allele-specific transcription factor binding data
bioRxiv (Cold Spring Harbor Laboratory) · 2023-12-15
preprintOpen accessSenior authorCorresponding
ABSTRACT Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity can manifest itself in vivo at heterozygous loci as a difference in TF occupancy between the two alleles. When applied on a genomic scale, functional genomic assays such as ChIP-seq typically lack the statistical power to detect allele-specific binding (ASB) at the level of individual variants. To address this, we propose a framework for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We show that a likelihood function based on an over-dispersed binomial distribution can aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. We introduce PyProBound, an easily extensible reimplementation of the ProBound biophysically interpretable machine learning framework. Configuring PyProBound to explicitly account for a confounding sequence-specific bias in DNA fragmentation rate yields improved TF binding models when training on ChIP-seq data. We also show how our likelihood function can be leveraged to perform de novo motif discovery on the raw allele-aware ChIP-seq counts.
Publisher OA PDF DOI
Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements
Nucleic Acids Research · 2023-08-31 · 10 citations
articleOpen access
TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Publisher OA PDF DOI
Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR
Nucleic Acids Research · 2023-04-04 · 3 citations
articleOpen accessSenior authorCorresponding
Classic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Publisher OA PDF DOI
Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements
bioRxiv (Cold Spring Harbor Laboratory) · 2023-07-27
preprintOpen access
ABSTRACT TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair, and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Publisher OA PDF DOI

Recent grants

NIH Grant R56AG047344
NIH · $557k · 2018
Integrative analysis of genetic variation and transcription factor networks to elucidate mechanisms of mental health disorders
NIH · $5.0M · 2015–2025
NIH Grant R01HG003008
NIH · $7.0M · 2022

Frequent coauthors

Bas van Steensel
The Netherlands Cancer Institute
57 shared
Chaitanya Rastogi
Columbia University
43 shared
Judith F. Kribelbauer
SIB Swiss Institute of Bioinformatics
32 shared
H. Tomas Rube
Columbia University
31 shared
Richard S. Mann
Columbia University
30 shared
Marinus F. van Batenburg
Columbia University
24 shared
Jeffrey J. Delrow
University of Cambridge
20 shared
Vincent FitzPatrick
Columbia University
18 shared

Education

PhD, Theoretical Physics
Utrecht University
1995

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Harmen Bussemaker

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you