Mohammed El-Kebir
· Associate ProfessorVerifiedUniversity of Illinois Urbana-Champaign · Statistics and Computer Science
Active 2007–2026
About
Mohammed El-Kebir is an associate professor at the University of Illinois at Urbana-Champaign, with affiliate appointments in the Department of Electrical and Computer Engineering, the Institute of Genomic Biology, and the National Center for Supercomputing Applications. He received his Ph.D. in Computer Science from VU University Amsterdam and Centrum Wiskunde & Informatica in 2015, and holds master's degrees in Bioinformatics and Computer Science and Engineering from VU University Amsterdam and Eindhoven University of Technology, respectively. His postdoctoral training was conducted with Ben Raphael at Brown University and Princeton University from 2014 to 2017. El-Kebir's main research area involves combinatorial optimization algorithms applied to problems in computational biology, particularly in cancer genomics. His work focuses on developing methods for estimating cancer phylogenies from sequencing data, including single-cell sequencing, and creating mathematical models to study cancer evolution and metastasis. His contributions include advances in the theoretical foundations of cancer phylogenetics, methods for tumor phylogeny estimation, and the development of comprehensive evolutionary models for somatic mutations. Recognized with awards such as the NSF CISE Research Initiation Initiative (CRII) Award in 2019 and the NSF CAREER Award in 2021, El-Kebir's research aims to improve scientific discovery through novel problem statements and analytical methods for omics data.
Research topics
- Evolutionary biology
- Medicine
- Biology
- Computer Science
- Genetics
- Virology
- Computational biology
- Mathematics
- Library science
- Gerontology
- Statistics
Selected publications
Inferring and summarizing tumor phylogenies from bulk DNA data
Figshare · 2026-02-18
otherOpen accessSenior authorAbstract Background Cancer phylogenies are key to understanding tumor evolution. However, due to the uncertainty in phylogenetic estimation, one typically infers many, equally-plausible phylogenies from bulk DNA sequencing data of tumors, hindering downstream analysis that relies on correct phylogenies. Results To resolve this challenge, we introduce Sapling, a method to solve two variants of the Backbone Tree Inference from Reads problem, which seeks a small set of backbone trees on a subset of mutations that collectively summarize the space of plausible cancer phylogenies. We prove that the problems are NP-hard. Conclusions On simulated and real data, we demonstrate that Sapling is capable of inferring high-quality backbone trees that adequately summarize the space of plausible cancer phylogenies. In addition, we demonstrate that Sapling is able to infer full-size trees with higher likelihoods than state-of-the-art methods.
Inferring and summarizing tumor phylogenies from bulk DNA data
Algorithms for Molecular Biology · 2026-02-18
articleOpen accessSenior authorCorrespondingBACKGROUND: Cancer phylogenies are key to understanding tumor evolution. However, due to the uncertainty in phylogenetic estimation, one typically infers many, equally-plausible phylogenies from bulk DNA sequencing data of tumors, hindering downstream analysis that relies on correct phylogenies. RESULTS: To resolve this challenge, we introduce Sapling, a method to solve two variants of the BACKBONE TREE INFERENCE FROM READS problem, which seeks a small set of backbone trees on a subset of mutations that collectively summarize the space of plausible cancer phylogenies. We prove that the problems are NP-hard. CONCLUSIONS: On simulated and real data, we demonstrate that Sapling is capable of inferring high-quality backbone trees that adequately summarize the space of plausible cancer phylogenies. In addition, we demonstrate that Sapling is able to infer full-size trees with higher likelihoods than state-of-the-art methods.
Deconvolving Phylogenetic Distance Mixtures
bioRxiv (Cold Spring Harbor Laboratory) · 2026-01-21
articleOpen accessMixtures of multiple constituent organisms are sequenced in several widely used applications, including metagenomics and metabarcoding. Characterizing the elements of the sequence mixture and their abundance with respect to a reference set of known organisms has been the subject of intense research across several domains, including microbiome analyses, and methods must overcome two key challenges. First, the mixture constituents are related to each other through an evolutionary history, and hence, should not be considered independent entities. Second, sequence data is noisy, with each short read providing a limited signal. While existing approaches attempt to address these challenges, addressing both challenges simultaneously has proved challenging. For evolutionary dependencies, methods either define hierarchical clusters (e.g., taxonomies or operational taxonomic/genomic units) or use phylogenetic trees. For the second challenge, they either assemble reads into contigs, use statistical priors to summarize read placements, or attempt to analyze all reads jointly using k-mers. Despite this rich literature, a natural approach to simultaneously address both challenges has been underexplored: compute a distance from the mixture to all references, deconvolve those distances, and place the sample on multiple branches of a reference phylogeny with associated abundances. This multi-placement approach is a natural extension of the single-read phylogenetic placement used in practice. We argue that by placing the entire sample on multiple branches instead of placing reads individually, we can obtain a less noisy profile of the mixture. We formalize this approach as the phylogenetic distance deconvolution (PDD) problem, show some limits on the identifiability of PDDs, propose a slow exact algorithm, and an efficient heuristic greedy algorithm with local refinements. Benchmarking shows that these heuristics are effective and that our implementation of the PDD approach (called DecoDiPhy) can accurately deconvolve phylogenetic mixture distances while scaling quadratically. Applied to metagenomics, DecoDiPhy consolidates reads mapped to a large number of branches on a reference tree to a much smaller number of placements. The consolidated placements improve the accuracy of downstream tasks, such as sample differentiation and detection of differentially abundant taxa.
Inferring and summarizing tumor phylogenies from bulk DNA data
Figshare · 2026-02-18
otherOpen accessSenior authorAbstract Background Cancer phylogenies are key to understanding tumor evolution. However, due to the uncertainty in phylogenetic estimation, one typically infers many, equally-plausible phylogenies from bulk DNA sequencing data of tumors, hindering downstream analysis that relies on correct phylogenies. Results To resolve this challenge, we introduce Sapling, a method to solve two variants of the Backbone Tree Inference from Reads problem, which seeks a small set of backbone trees on a subset of mutations that collectively summarize the space of plausible cancer phylogenies. We prove that the problems are NP-hard. Conclusions On simulated and real data, we demonstrate that Sapling is capable of inferring high-quality backbone trees that adequately summarize the space of plausible cancer phylogenies. In addition, we demonstrate that Sapling is able to infer full-size trees with higher likelihoods than state-of-the-art methods.
Additional file 1 of Inferring and summarizing tumor phylogenies from bulk DNA data
Figshare · 2026-02-18
articleOpen accessSenior authorAdditional file 1. Supplemental Materials and Methods.
Additional file 1 of Inferring and summarizing tumor phylogenies from bulk DNA data
Figshare · 2026-02-18
articleOpen accessSenior authorAdditional file 1. Supplemental Materials and Methods.
Summarizing RNA Structural Ensembles via Maximum Agreement Secondary Structures
bioRxiv (Cold Spring Harbor Laboratory) · 2026-02-26
articleOpen accessSenior authorCorrespondingAbstract Summarizing a collection P of related RNA secondary structures is a key challenge in applications like evolutionary analysis, alternative fold studies and mRNA vaccine design. This requires both clustering the input structures into similar groups and identifying the core structural motifs on which they agree or differ. Existing methods fail by focusing on only one of these goals: clustering methods do not output shared motifs, while consensus methods overlook the structural diversity present in the collection. Here, we introduce the M aximum A greement S econdary S tructures (MASS) problem, which seeks the largest set F of structural features present in P that partition the input structures into a user-specified number τ of distinct clusters. We prove that MASS is NP-hard and also establish its equivalence to a constrained binary matrix projection problem. We present an exact integer linear program, an exact combinatorial algorithm, and a scalable beam-search heuristic. Using simulations we demonstrate the performance of these exact algorithms and heuristics relative to baseline methods that focus on either clustering or identifying a single consensus tree. On real data, we demonstrate that MASS identifies conserved scaffolds in conformational datasets, reveals conserved structural motifs in different species within RNA families, and recovers shared structural features among synonymous transcripts encoding the same protein. MASS provides a general and interpretable framework for summarizing RNA structural organization.
Lecture notes in computer science · 2025-01-01 · 1 citations
book-chapterSenior authorFast tumor phylogeny regression via tree-structured dual dynamic programming
bioRxiv (Cold Spring Harbor Laboratory) · 2025-01-27 · 1 citations
preprintOpen accessSenior authorCorrespondingAbstract Motivation Reconstructing the evolutionary history of tumors from bulk DNA sequencing of multiple tissue samples remains a challenging computational problem, requiring simultaneous deconvolution of the tumor tissue and inference of its evolutionary history. Recently, phylogenetic reconstruction methods have made significant progress by breaking the reconstruction problem into two parts: a regression problem over a fixed topology and a search over tree space. While effective techniques have been developed for the latter search problem, the regression problem remains a bottleneck in both method design and implementation due to the lack of fast, specialized algorithms. Results Here, we introduce fastppm , a fast tool to solve the regression problem via tree-structured dual dynamic programming. fastppm supports arbitrary, separable convex loss functions including the ℓ 2 , piecewise linear, binomial and beta-binomial loss and provides asymptotic improvements for the ℓ 2 and piecewise linear loss over existing algorithms. We find that fastppm empirically outperforms both specialized and general purpose regression algorithms, obtaining 50-450 × speedups while providing as accurate solutions as existing approaches. Incorporating fastppm into several phylogeny inference algorithms immediately yields up to 400 × speedups, requiring only a small change to the program code of existing software. Finally, fastppm enables analysis of low-coverage bulk DNA sequencing data on both simulated data and in a patient-derived mouse model of colorectal cancer, outperforming state-of-the-art phylogeny inference algorithms in terms of both accuracy and runtime. Availability fastppm is implemented in C ++ and available as both a command-line interface and Python library at github.com/elkebir-group/fastppm.git.
Characterizing the Solution Space of Migration Histories of Metastatic Cancers with MACH2
Lecture notes in computer science · 2025-01-01
book-chapterSenior author
Recent grants
NSF · $175k · 2019–2022
RAPID: Deciphering Within-host Diversity and Multi-strain Infections in COVID-19
NSF · $100k · 2020–2021
NSF · $500k · 2021–2027
Frequent coauthors
- 67 shared
Gunnar W. Klau
Life Science Center Düsseldorf (Germany)
- 34 shared
Benjamin J. Raphael
- 31 shared
Valentina Boeva
- 29 shared
Simone Zaccaria
University College London
- 26 shared
Idoia Ochoa
Universidad de Navarra
- 22 shared
Bas E. Dutilh
Friedrich Schiller University Jena
- 17 shared
Victor Guryev
University Medical Center Groningen
- 17 shared
Palash Sashittal
Princeton University
Labs
The Grainger College of EngineeringPI
Education
- 2014
PhD
Centrum Wiskunde en Informatica
- 2010
MSc, Bioinformatics
Vrije Universiteit Amsterdam
- 2009
MSc, Computer Science and Engineering
Technische Universiteit Eindhoven
- 2006
BSc, Computer Science and Engineering
Technische Universiteit Eindhoven
Awards & honors
- National Science Foundation CISE Research Initiation Initiat…
- CAREER Award (2021)
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Mohammed El-Kebir
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup