Louis-Marie Jean Fabrice Bobay
· Asst ProfessorVerifiedNorth Carolina State University · Plant and Microbial Biology
Active 2011–2026
Research topics
- Evolutionary biology
- Biology
- Genetics
- Computational biology
- Ecology
- Computer Science
- Artificial Intelligence
- Astrobiology
- Virology
- Programming language
Selected publications
DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling
bioRxiv (Cold Spring Harbor Laboratory) · 2026-03-12
articleOpen accessSenior authorAbstract Physicochemically similar amino acids undergo more frequent substitutions compared to dissimilar amino acid pairs. Despite their clear potential, amino acid similarity matrices remain underused in molecular evolution, partially due to the high number of proposed amino acid distance measures and the lack of agreement on which are most accurate. In this study, we assessed the performance of 30 amino acid distance measures, including a new amino acid distance measure we developed based on recent deep mutational scanning data. We compared these measures across codon substitution models fit to alignments spanning Streptococcus , Drosophila , and mammalian lineages, as well as segregating variants across Escherichia coli strains and human genotypes. We further constructed consensus measures from combinations of top-performing measures in this analysis using the DISTATIS approach and retested these matrices. Our results show that experimentally-derived measures, particularly our new measure and the existing experimental exchangeability (EX) measure, best fit codon substitution patterns across diverse lineages. We found that a consensus measure based on these two approaches, which we named DEX, performed best overall. In addition, although site-specific variant effect predictors are intended to identify deleterious mutations, the representative tools we tested did not outperform amino acid distance measures for predicting mean substitution frequencies. They were however substantially more informative for identifying individual highly deleterious mutations. Overall, we provide a systematic comparison of the performance of existing measures, and we introduce an improved general-purpose amino acid distance measure for molecular evolution models. Significance Protein-coding genes have long been a focus for researchers studying the strength and direction of selection. By studying non-synonymous substitutions, those that change amino acids, it is possible to estimate the relative strength of selection. Despite widespread interest in such approaches, information on which amino acids are exchanged is underused in molecular evolution models. This is partly because many different measures exist for quantifying amino acid distances, particularly those based on physicochemical properties. A newer class of amino acid distance measures is derived from deep mutational scanning datasets, where virtually every possible substitution is tested for its impact on protein function. We characterised and compared 30 amino acid distance measures, including a novel measure based on deep mutational scanning data. We highlight differences in how well these measures fit real substitution and polymorphism datasets. Overall, we find that DEX, which is a consensus of our new measure and an existing experimental exchangeability measure, represents the best available amino acid distance measure to incorporate into molecular evolution models.
Open MIND · 2026-03-09
datasetSenior authorKey datafiles for the manuscript entitled "DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling" by Gavin Douglas and Louis-Marie Bobay. The key table most readers will be interested in is "all_distance_measures_symmetric.tsv.gz". This is tab-delimited table with the pairwise amino acid distances based on all the measures we evaluated. Each row corresponds to a different amino acid pair, but note that the distances are symmetric for all measures (i.e., those with asymmetric distances between amino acids were averaged to be symmetric). The final columns in this table that indicate combined measures with "+" are non-focal DISTATIS consensus measures. These subdirectories are in the compressed folder called "workflow_files", and contain the key files for running our analyses: aa_metrics - Working files for processing and analyzing AA distance/similarity measures. Note that those interested in the final measures should use "all_distance_measures_symmetric.tsv.gz" allele_freq_vs_predicted_effects - Key files used for analyzing segregating non-synonymous polymorphisms. PAML_workflow - Files for fitting codon substitution models with PAML proteinGym - Files from the proteinGym database used for producing the custom DMS-EX measure.
bioRxiv (Cold Spring Harbor Laboratory) · 2026-03-03
articleOpen accessSenior authorCorrespondingAbstract Summary Metagenomics provides broad insights from microbial communities, but more biological relevant phenotypes are attributed to subtle changes at the strain-level rather than species. Despite development of several tools using different algorithms, resolving individual strains from short-read pair-end sequencing data remains challenging. We developed MetaStrainer , a tool capable of reconstructing strain genotypes from metagenomic data. Compared with existing approaches, MetaStrainer substantially increases genotype accuracy, correctly identifies the number of strains, and accurately estimates their relative abundances. Accuracy of reconstructed genotypes is robust to choice of mapping reference. Availability and implementation MetaStrainer is implemented in Python 3. Source code and instructions are available on GitHub at https://www.github.com/lbobay/MetaStrainer and on Zenodo: https://doi.org/10.5281/zenodo.17872331 Contact ljbobay@ncsu.edu Supplementary Information Supplementary data is available at Bioinformatics online.
Zenodo (CERN European Organization for Nuclear Research) · 2026-03-09
datasetOpen accessSenior authorKey datafiles for the manuscript entitled "DEX: a consensus-based amino acid exchangeability measure for improved codon substitution modelling" by Gavin Douglas and Louis-Marie Bobay. The key table most readers will be interested in is "all_distance_measures_symmetric.tsv.gz". This is tab-delimited table with the pairwise amino acid distances based on all the measures we evaluated. Each row corresponds to a different amino acid pair, but note that the distances are symmetric for all measures (i.e., those with asymmetric distances between amino acids were averaged to be symmetric). The final columns in this table that indicate combined measures with "+" are non-focal DISTATIS consensus measures. These subdirectories are in the compressed folder called "workflow_files", and contain the key files for running our analyses: aa_metrics - Working files for processing and analyzing AA distance/similarity measures. Note that those interested in the final measures should use "all_distance_measures_symmetric.tsv.gz" allele_freq_vs_predicted_effects - Key files used for analyzing segregating non-synonymous polymorphisms. PAML_workflow - Files for fitting codon substitution models with PAML proteinGym - Files from the proteinGym database used for producing the custom DMS-EX measure.
Prevalence and Evolutionary Implications of Genome Rearrangements in Bacteria
Genome Biology and Evolution · 2026-01-14 · 1 citations
articleOpen accessSenior authorThe genetic material of bacteria and archaea is organized into various structures and setups, attesting that genome architecture is dynamic in these organisms. However, strong selective pressures are also acting to preserve genome organization, and it remains unclear how frequently genomes experience rearrangements and what mechanisms lead to these processes. Here, we assessed the dynamics and the drivers of genomic rearrangements across 121 microbial species. We show that synteny is highly conserved within most species, although several species present exceptionally flexible genomic layouts. Our results show that genomic rearrangements occur at a variable pace across bacteria and archaea, pointing to different selective constraints driving the accumulation of genomic changes across species. Importantly, we found that not only inversions but also translocations are highly enriched near the origin of replication (Ori), which suggests that many rearrangements may confer an adaptive advantage to the cell through the relocation of genes that benefit from gene dosage effects. Finally, our results confirm the view that mobile genetic elements-in particular transposable elements-are the main drivers of genomic translocations and inversions. Overall, our study shows that microbial species present largely stable genomic layouts and identifies key patterns and drivers of genome rearrangements in prokaryotes.
bioRxiv (Cold Spring Harbor Laboratory) · 2025-03-28
preprintOpen accessAbstract Understanding the drivers and consequences of horizontal gene transfer (HGT) is a key goal of microbial evolution research. Although co-occurring taxa have long been appreciated to undergo HGT more often, this association is confounded with other factors, most notably their phylogenetic relatedness. To disentangle these factors, we analyzed 15,339 marine prokaryotic genomes (mainly bacteria) and their distribution in the global ocean. We identified HGT events across these genomes and enrichments for functions previously shown to be prone to HGT. By mapping metagenomic reads from 1,862 ocean samples to these genomes, we also identified co-occurrence patterns and environmental associations. Although we observed an expected negative association between HGT rates and phylogenetic distance, we only detected an association between co-occurrence and phylogenetic distance for closely related taxa. This observation refines the previously reported trend to closely related taxa, rather than a consistent pattern across all taxonomic levels, at least here within marine environments. In addition, we identified a significant association between co-occurrence and HGT, which remains even after controlling for phylogenetic distance and measured environmental variables. In a subset of samples with extended environmental data, we identified higher HGT levels associated with particle-attached bacteria and associations of varying directions with specific environmental variables, such as chlorophyll a and photosynthetically available radiation. Overall, our findings demonstrate the significant influence of ecological associations in shaping marine bacterial evolution through HGT.
Applying the Classic Test dN/dS to Detect Selection in Archaea
Methods in molecular biology · 2025-01-01 · 1 citations
articleSenior authorIntrogression impacts the evolution of bacteria, but species borders are rarely fuzzy
Nature Communications · 2025-11-13 · 1 citations
articleOpen accessSenior authorMost bacteria engage in gene flow through homologous recombination, and this mechanism may play a crucial role in maintaining species cohesiveness, much like sexual reproduction does in eukaryotes. However, introgression has been reported in bacteria and is associated with fuzzy species borders in some lineages, but its prevalence and impact on the delimitation of bacterial species have not been systematically characterized. Here, we use the term “introgression” to describe gene flow between the genomic backbone of distinct species (i.e., their core genomes)—an analogy to the classical usage in sexual organisms, but distinct in mechanism. We quantified the patterns of introgression across 50 major bacterial lineages. Our results reveal that bacteria present various levels of introgression, with an average of 2% of introgressed core genes and up to 14% in Escherichia–Shigella. Furthermore, our results show that some species are more prone to introgression than others within the same genus, and introgression is most frequent between highly related species. We found evidence that the various levels of introgression across lineages are likely associated with sequence relatedness, but the impact of ecology on this process was less clear. Introgression can occasionally lead to fuzzy species borders, although many of these cases are likely instances of ongoing speciation. Overall, our results indicate that introgression has substantially shaped the evolution and the diversification of bacteria, but this process does not substantially blur species borders. It is commonly thought that bacterial species borders tend to be fuzzy, due to frequent exchange of DNA. Here, Diop et al. quantify the patterns of gene flow between core genomes across 50 major bacterial lineages, showing that defining species using a framework inspired by the Biological Species Concept allows to identify clear species borders in most lineages.
The ISME Journal · 2025-12-10 · 1 citations
articleOpen accessUnderstanding the drivers and consequences of horizontal gene transfer (HGT) is a key goal of microbial evolution research. Although co-occurring taxa have long been appreciated to undergo HGT more often, this association is confounded with other factors, most notably their phylogenetic relatedness. To disentangle these factors, we analyzed 15 339 marine prokaryotic genomes (mainly bacteria) and their distribution in the global ocean. We identified HGT events across these genomes and enrichments for functions previously shown to be prone to HGT. By mapping metagenomic reads from 1862 ocean samples to these genomes, we also identified co-occurrence patterns and environmental associations. Although we observed an expected negative association between HGT rates and phylogenetic distance, we only detected an association between co-occurrence and phylogenetic distance for closely related taxa. This observation refines the previously reported trend to closely related taxa, rather than a consistent pattern across all taxonomic levels, at least here within marine environments. In addition, we identified a significant association between co-occurrence and HGT, which remains even after controlling for phylogenetic distance and measured environmental variables. In a subset of samples with extended environmental data, we identified higher HGT levels associated with particle-attached prokaryotes and associations of varying directions with specific environmental variables, such as chlorophyll a and photosynthetically available radiation. Overall, our findings demonstrate the significant influence of ecological associations in shaping marine prokaryotic evolution through HGT.
Introgression impacts the evolution of bacteria, but species borders are rarely fuzzy
bioRxiv (Cold Spring Harbor Laboratory) · 2024-05-09
preprintOpen accessSenior authorCorrespondingAbstract Most bacteria engage in gene flow and that this may act as a force maintaining species cohesiveness like it does in sexual organisms. However, introgression (gene flow between the genomic backbone of distinct species) has been reported in bacteria and is associated with fuzzy species borders in some lineages, but its prevalence and impact on the delimitation of bacterial species has not been systematically characterized. Here, we quantified the patterns of introgression across 50 major bacterial lineages. Our results reveal that bacteria present various levels of introgression, with an average of 2% of introgressed core genes and up to 12% in Campylobacter . Furthermore, our results show that some species are more prone to introgression than others within the same genus and introgression is most frequent between highly related species. We found evidence that the various levels of introgression across lineages are likely related to ecological proximity between species. Introgression can occasionally lead to fuzzy species borders, although many of these cases are likely instances of ongoing speciation. Overall, our results indicate that introgression has substantially shaped the evolution and the diversification of bacteria, but this process does not substantially blur species borders.
Recent grants
Frequent coauthors
- 32 shared
Marie Touchon
Centre National de la Recherche Scientifique
- 32 shared
Eduardo P. C. Rocha
Université Paris Cité
- 20 shared
Anne Chevallereau
Inserm
- 15 shared
Caroline M. Stott
North Carolina State University
- 15 shared
Florian Douam
Boston University
- 15 shared
Awa Diop
North Carolina State University
- 15 shared
François‐Loïc Cosset
École Normale Supérieure de Lyon
- 15 shared
Howard Ochman
The University of Texas at Austin
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Louis-Marie Jean Fabrice Bobay
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup