Harmen Bussemaker
· Professor of Biological Sciences and of Systems BiologyVerifiedColumbia University · Molecular, Cellular, and Developmental Biology
Active 1992–2025
Research topics
- Biology
- Computational biology
- Biochemistry
- Computer Science
- Chemistry
- Biological system
Selected publications
Protein Science · 2025-10-14 · 1 citations
articleOpen accessSenior authorCorrespondingShort linear peptide motifs play important roles in phosphotyrosine-dependent signaling networks. They can act both as substrates of kinases and phosphatases and as ligands of peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. In recent years, protein display technologies and next-generation sequencing (NGS) have allowed researchers to profile SH2 domain binding across large libraries of candidate ligands. Here, we present a concerted experimental and computational strategy that updates such specificity profiling from classification to quantification. Multi-round affinity selection on random phosphopeptide libraries yields NGS data suitable for training an additive model that accurately predicts binding free energy across the full theoretical ligand sequence space. For SH2 domains that have been profiled in this manner, the sequence-to-affinity model can be used to predict novel phosphosite targets or the impact of phosphosite variants on binding.
Nucleic Acids Research · 2025-08-08 · 1 citations
articleOpen accessSenior authorSequence-specific interactions of transcription factors (TFs) with genomic DNA underlie many cellular processes. High-throughput in vitro binding assays coupled with machine learning have made it possible to accurately define such molecular recognition in a biophysically interpretable way for hundreds of TFs across many structural families, providing new avenues for predicting how the sequence preference of a TF is impacted by disease-associated mutations in its DNA binding domain. We developed a method based on a reference-free tetrahedral representation of variation in base preference within a given structural family that can be used to accurately predict the effect of mutations in the protein sequence of the TF. Using the basic helix-loop-helix (bHLH) and homeodomain (HD) families as test cases, our results demonstrate the feasibility of accurately predicting the shifts (ΔΔΔG/RT) in binding free energy associated with TF mutants by leveraging high-quality DNA binding models for sets of homologous wild-type TFs.
bioRxiv (Cold Spring Harbor Laboratory) · 2024-12-23
preprintOpen accessSenior authorCorrespondingShort linear peptide motifs play important roles in cell signaling. They can act as modification sites for enzymes and as recognition sites for peptide binding domains. SH2 domains bind specifically to tyrosine-phosphorylated proteins, with the affinity of the interaction depending strongly on the flanking sequence. Quantifying this sequence specificity is critical for deciphering phosphotyrosine-dependent signaling networks. In recent years, protein display technologies and deep sequencing have allowed researchers to profile SH2 domain binding across thousands of candidate ligands. Here, we present a concerted experimental and computational strategy that improves the predictive power of SH2 specificity profiling. Through multi-round affinity selection and deep sequencing with large randomized phosphopeptide libraries, we produce suitable data to train an additive binding free energy model that covers the full theoretical ligand sequence space. Our models can be used to predict signaling network connectivity and the impact of missense variants in phosphoproteins on SH2 binding.
Genome biology · 2024-10-31 · 2 citations
articleOpen accessSenior authorBACKGROUND: Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. RESULTS: We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. CONCLUSION: Our work provides new strategies for predicting the functional impact of non-coding variants.
Identifying genetic regulatory variants that affect transcription factor activity
Cell Genomics · 2023-08-18 · 7 citations
articleOpen accessSenior authorCorresponding-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.
Systematic dissection of the regulatory logic of transcriptional activation by TP53 binding sites
Zenodo (CERN European Organization for Nuclear Research) · 2023-02-21
articleOpen accessThis deposition contains supplementary data files and laboratory notebooks of the manuscript 'Systematic dissection of the regulatory logic of transcriptional activation by TP53 binding sites'. In this study, thousands of barcoded TP53 reporters were systematically designed and probed in MCF7 and TP53-KO MCF7 cells upon stimulation with Nutlin-3a. Barcodes in the cDNA were quantified by high-throughput sequencing. Among others, the results demonstrate that TP53 drives transcription cooperatively from adjacent TP53 binding sites, and that the positioning of the TP53 binding sites is crucial for transcriptional activation. The files deposited here include: Laboratory notebooks (pdfs printed from labguru). This includes the generation of the plasmid pools, the library transfection protocols, and the western blotting procedure. Raw gel images etc. are available in this folder. Clustered barcode count files. These are the raw barcode counts that were extracted from the sequencing files and clustered using starcode. A reporter information sheet. This file includes a table containing all relevant reporter information and reporter sequences. A reporter activity sheet. This file includes the calculated reporter activities and can be used to recreate the figures presented in the manuscript. A supplementary table containing the primer sequences used in this study.
Benchmarking DNA binding affinity models using allele-specific transcription factor binding data
bioRxiv (Cold Spring Harbor Laboratory) · 2023-12-15
preprintOpen accessSenior authorCorrespondingABSTRACT Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity can manifest itself in vivo at heterozygous loci as a difference in TF occupancy between the two alleles. When applied on a genomic scale, functional genomic assays such as ChIP-seq typically lack the statistical power to detect allele-specific binding (ASB) at the level of individual variants. To address this, we propose a framework for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We show that a likelihood function based on an over-dispersed binomial distribution can aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. We introduce PyProBound, an easily extensible reimplementation of the ProBound biophysically interpretable machine learning framework. Configuring PyProBound to explicitly account for a confounding sequence-specific bias in DNA fragmentation rate yields improved TF binding models when training on ChIP-seq data. We also show how our likelihood function can be leveraged to perform de novo motif discovery on the raw allele-aware ChIP-seq counts.
Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements
Nucleic Acids Research · 2023-08-31 · 10 citations
articleOpen accessTP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Defining the fine structure of promoter activity on a genome-wide scale with CISSECTOR
Nucleic Acids Research · 2023-04-04 · 3 citations
articleOpen accessSenior authorCorrespondingClassic promoter mutagenesis strategies can be used to study how proximal promoter regions regulate the expression of particular genes of interest. This is a laborious process, in which the smallest sub-region of the promoter still capable of recapitulating expression in an ectopic setting is first identified, followed by targeted mutation of putative transcription factor binding sites. Massively parallel reporter assays such as survey of regulatory elements (SuRE) provide an alternative way to study millions of promoter fragments in parallel. Here we show how a generalized linear model (GLM) can be used to transform genome-scale SuRE data into a high-resolution genomic track that quantifies the contribution of local sequence to promoter activity. This coefficient track helps identify regulatory elements and can be used to predict promoter activity of any sub-region in the genome. It thus allows in silico dissection of any promoter in the human genome to be performed. We developed a web application, available at cissector.nki.nl, that lets researchers easily perform this analysis as a starting point for their research into any promoter of interest.
Optimisation of TP53 reporters by systematic dissection of synthetic TP53 response elements
bioRxiv (Cold Spring Harbor Laboratory) · 2023-07-27
preprintOpen accessABSTRACT TP53 is a transcription factor that controls multiple cellular processes, including cell cycle arrest, DNA repair, and apoptosis. The relation between TP53 binding site architecture and transcriptional output is still not fully understood. Here, we systematically examined in three different cell lines the effects of binding site affinity and copy number on TP53-dependent transcriptional output, and also probed the impact of spacer length and sequence between adjacent binding sites, and of core promoter identity. Paradoxically, we found that high-affinity TP53 binding sites are less potent than medium-affinity sites. TP53 achieves supra-additive transcriptional activation through optimally spaced adjacent binding sites, suggesting a cooperative mechanism. Optimally spaced adjacent binding sites have a ∼10-bp periodicity, suggesting a role for spatial orientation along the DNA double helix. We leveraged these insights to construct a log-linear model that explains activity from sequence features, and to identify new highly active and sensitive TP53 reporters.
Recent grants
NIH · $557k · 2018
NIH · $5.0M · 2015–2025
NIH · $7.0M · 2022
Frequent coauthors
- 57 shared
Bas van Steensel
The Netherlands Cancer Institute
- 43 shared
Chaitanya Rastogi
Columbia University
- 32 shared
Judith F. Kribelbauer
SIB Swiss Institute of Bioinformatics
- 31 shared
H. Tomas Rube
Columbia University
- 30 shared
Richard S. Mann
Columbia University
- 24 shared
Marinus F. van Batenburg
Columbia University
- 20 shared
Jeffrey J. Delrow
University of Cambridge
- 18 shared
Vincent FitzPatrick
Columbia University
Education
- 1995
PhD, Theoretical Physics
Utrecht University
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Harmen Bussemaker
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup