
Srinivas Aluru
· Senior Associate Dean and Regents' ProfessorVerifiedGeorgia Institute of Technology · Computer Science
Active 1992–2026
About
Srinivas Aluru is a Regents' Professor in the School of Computational Science and Engineering within the College of Computing at Georgia Institute of Technology. He conducts research in high performance computing, bioinformatics and systems biology, combinatorial scientific computing, and applied algorithms. He pioneered the development of parallel methods in computational biology, contributed to the assembly and analysis of complex plant genomes, and his group is currently focused on developing bioinformatics methods for high-throughput DNA sequencing, particularly error correction and genome assembly. In systems biology, his group works on network inference methods using mutual information and Bayesian approaches, as well as network analysis techniques to further the understanding of partially characterized pathways. His contributions in scientific computing include the parallel Fast Multipole Method, domain decomposition methods, spatial data structures, and applications in computational electromagnetics and materials informatics. Aluru is a Fellow of the American Association for the Advancement of Science (AAAS) and the Institute for Electrical and Electronic Engineers (IEEE). He has received awards such as the NSF Career award, IBM faculty award, and Swarnajayanti fellowship from the Government of India. He also serves on the editorial boards of several prominent journals.
Research topics
- Computer Science
- Artificial Intelligence
- Data Mining
- Biology
- Genetics
- Algorithm
- Computational biology
- Parallel computing
- Database
Selected publications
PBMC Dataset for Evaluation of Parallel Ensemble Network Construction
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-20
datasetOpen accessSenior authorThis is dataset is a real dataset used in evaluation of Gene Networks. This dataset is processed from a 10X dataset labeled "20k Human PBMCs, 3' HT v3.1, Chromium X v3.1 " downloaded from https://www.10xgenomics.com/datasets/20-k-human-pbm-cs-3-ht-v-3-1-chromium-x-3-1-high-6-1-0 The dataset is processed using scanpy workflow for quality control and the top 5,000 highly variable genes are selected.
PBMC Dataset for Evaluation of Parallel Ensemble Network Construction
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-20
datasetOpen accessSenior authorThis is dataset is a real dataset used in evaluation of Gene Networks. This dataset is processed from a 10X dataset labeled "20k Human PBMCs, 3' HT v3.1, Chromium X v3.1 " downloaded from https://www.10xgenomics.com/datasets/20-k-human-pbm-cs-3-ht-v-3-1-chromium-x-3-1-high-6-1-0 The dataset is processed using scanpy workflow for quality control and the top 5,000 highly variable genes are selected.
Lung Cancer Single-cell Datasets used in Evaluation of Parallel Ensemble Networks
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-20
datasetOpen accessSenior authorA collection of eight datasets derived from the Lung cancer dataset from the pan-cancer blueprint study: Qian, J., Olbrecht, S., Boeckx, B. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res 30, 745–762 (2020). https://doi.org/10.1038/s41422-020-0355-0 Eight datasets are generated by selecting the top 1,000 , 3,000, 5,000, 8,000, 10,000 , 12,000 , 15,000 and 18,000 highly variable genes from the 91,068 cells dataset from the lung tissues of the lung cancer patients. Also included are the rowmajor versions of the datasets, where the genes correspond to rows of the datset and the cells correspond the columns of the datset.
Lung Cancer Single-cell Datasets used in Evaluation of Parallel Ensemble Networks
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-20
datasetOpen accessSenior authorA collection of eight datasets derived from the Lung cancer dataset from the pan-cancer blueprint study: Qian, J., Olbrecht, S., Boeckx, B. et al. A pan-cancer blueprint of the heterogeneous tumor microenvironment revealed by single-cell profiling. Cell Res 30, 745–762 (2020). https://doi.org/10.1038/s41422-020-0355-0 Eight datasets are generated by selecting the top 1,000 , 3,000, 5,000, 8,000, 10,000 , 12,000 , 15,000 and 18,000 highly variable genes from the 91,068 cells dataset from the lung tissues of the lung cancer patients. Also included are the rowmajor versions of the datasets, where the genes correspond to rows of the datset and the cells correspond the columns of the datset.
PBMC Dataset for Evaluation of Parallel Ensemble Network Construction
Zenodo (CERN European Organization for Nuclear Research) · 2026-04-20
datasetOpen accessSenior authorThis is dataset is a real dataset used in evaluation of Gene Networks. This dataset is processed from a 10X dataset labeled "20k Human PBMCs, 3' HT v3.1, Chromium X v3.1 " downloaded from https://www.10xgenomics.com/datasets/20-k-human-pbm-cs-3-ht-v-3-1-chromium-x-3-1-high-6-1-0 The dataset is processed using scanpy workflow for quality control and the top 5,000 highly variable genes are selected.
Disambiguating a Soft Metagenomic Clustering
Journal of Computational Biology · 2025-03-07
articleSenior authorClustering is a popular technique used for analyzing amplicon sequencing data in metagenomics. Specifically, it is used to assign sequences ( reads ) to clusters, each cluster representing a species or a higher level taxonomic unit. Reads from multiple species often sharing subsequences, combined with lack of a perfect similarity measure, make it difficult to correctly assign reads to clusters. Thus, metagenomic clustering methods must either resort to ambiguity, or make the best available choice at each read assignment stage, which could lead to incorrect clusters and potentially cascading errors. In this article, we argue for first generating an ambiguous clustering and then resolving the ambiguities collectively by analyzing the ambiguous clusters. We propose a rigorous formulation of this problem and show that it is NP -Hard. We then propose an efficient heuristic to solve it in practice. We validate our approach on several synthetically generated datasets and two datasets consisting of 16S rDNA sequences from the microbiome of rat guts.
The Power of Parallelism: Accelerating Discovery in the Biosciences
2025-06-03
article1st authorCorrespondingThe transformative power of parallel computing is most evident when tackling problems that would otherwise be intractable due to immense computational demands, memory constraints, or time-tosolution limitations. Once an intellectual curiosity, parallel computational biology has evolved into an indispensable tool for modern biological research, driven by the rapid proliferation of high-throughput instrumentation. This talk will examine the expanding role of parallel computing in biosciences, highlighting key challenges my group has addressed in computational genomics and systems biology over the past twenty-five years. These challenges have spurred the development of new algorithmic innovations involving strings, graphs, and complex system learning, with broad applicability beyond biology. As the field continues to evolve, emerging applications present fresh opportunities for parallel computing to drive discovery and innovation in the biosciences.
SCEMENT: scalable and memory efficient integration of large-scale single-cell RNA-sequencing data
Bioinformatics · 2025-02-01 · 1 citations
articleOpen accessSenior authorMOTIVATION: Integrative analysis of large-scale single-cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single-cell RNA-sequencing data integration, many lack the scalability to handle large numbers of datasets and/or millions of cells due to their memory and run time requirements. The few tools that can handle large data do so by reducing the computational burden through strategies such as subsampling of the data or selecting a reference dataset to improve computational efficiency and scalability. Such shortcuts, however, hamper the accuracy of downstream analyses, especially those requiring quantitative gene expression information. RESULTS: We present SCEMENT, a SCalablE and Memory-Efficient iNTegration method, to overcome these limitations. Our new parallel algorithm builds upon and extends the linear regression model previously applied in ComBat to an unsupervised sparse matrix setting to enable accurate integration of diverse and large collections of single-cell RNA-sequencing data. Using tens to hundreds of real single-cell RNA-seq datasets, we show that SCEMENT outperforms ComBat as well as FastIntegration and Scanorama in runtime (upto 214× faster) and memory usage (upto 17.5× less). It not only performs batch correction and integration of millions of cells in under 25 min, but also facilitates the discovery of new rare cell types and more robust reconstruction of gene regulatory networks with full quantitative gene expression information. AVAILABILITY AND IMPLEMENTATION: Source code freely available for download at https://github.com/AluruLab/scement, implemented in C++ and supported on Linux.
Efficient and effective methods for variant selection
2025-10-12
articleOpen access1st authorCorrespondingVariation graphs succinctly capture genetic variations among individuals within a species or a target population. The use of variation graphs instead of a single reference genome is credited with reducing bias and increasing the accuracy of sequence mapping algorithms. However, complete variation graphs that comprehensively incorporate all genetic variations are often found to be ineffective and inaccurate in practice due to the presence of a combinatorially explosive number of paths in the graph that do not correspond to any observed genome. Thus, a balance is struck in carefully selecting a subset of variants to be incorporated, for which mathematical frameworks have recently been developed. We advance the mathematical framework proposed by Jain et al.[Bioinformatics, 37 (2021)], where integer linear programming formulations were developed for optimal variant selection. We propose novel graph-based formulations and develop exact and fast algorithms for certain cases, approximation methods for some others, and empirically close to optimal results in all cases. The primary advantage of the algorithms designed here is that they provide near-optimal results at orders of magnitude faster run time of an ILP solver.
SCEMENT: Scalable and Memory Efficient Integration of Large-scale Single Cell RNA-sequencing Data
bioRxiv (Cold Spring Harbor Laboratory) · 2024-07-02
preprintOpen accessSenior authorCorrespondingAbstract Motivation Integrative analysis of large-scale single cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single cell RNA-sequencing data integration, many lack scalability to handle large numbers of datasets and/or millions of cells due to their memory and run time requirements. The few tools which can handle large data do so by reducing the computational burden through strategies such as subsampling of the data or selecting a reference dataset, to improve computational efficiency and scalability. Such shortcuts however hamper accuracy of downstream analyses, especially those requiring quantitative gene expression information. Results We present SCEMENT, a SCalablE and Memory-Efficient iNTegration method to overcome these limitations. Our new parallel algorithm builds upon and extends the linear regression model previously applied in ComBat, to an unsupervised sparse matrix setting to enable accurate integration of diverse and large collections of single cell RNA-sequencing data. Using tens to hundreds of real single cell RNA-seq datasets, we show that SCEMENT outperforms ComBat as well as FastIntegration and Scanorama in runtime (upto 214X faster) and memory usage (upto 17.5X less). It not only performs batch correction and integration of millions of cells in under 25 minutes, but also facilitates discovery of new rare cell-types and more robust reconstruction of gene regulatory networks with full quantitative gene expression information. Availability and implementation Source code freely available for download at https://github.com/AluruLab/scement , implemented in C++ and supported on Linux. Contact aluru@cc.gatech.edu Supplementary information Supplementary data are available at Bioinformatics online.
Recent grants
AF: Medium: Parallel Algorithms and Software for High-Throughput Sequence Assembly
NSF · $925k · 2013–2018
NSF · $383k · 2008–2012
EAGER: A Framework for Learning Graph Algorithms with Applications to Social and Gene Networks
NSF · $300k · 2018–2021
AF: Small: Algorithmic Techniques for High-throughput Analysis of Long Reads
NSF · $425k · 2018–2022
NSF · $163k · 2003–2007
Frequent coauthors
- 318 shared
David A. Bader
- 300 shared
Guojing Cong
Oak Ridge National Laboratory
- 216 shared
Jesper Larsson Träff
- 214 shared
Piotr Łuszczek
- 208 shared
Jack Dongarra
- 180 shared
Jarosław Żola
University at Buffalo, State University of New York
- 152 shared
Felix Wolf
Technical University of Darmstadt
- 152 shared
Xiaoye Sherry Li
Education
- 1990
Ph.D., Computer Science
University of California, San Diego
- 1986
M.S., Computer Science
University of California, San Diego
- 1984
B.S., Computer Science and Engineering
Indian Institute of Technology, Kanpur
Awards & honors
- NSF Career Award (1997)
- IBM Faculty Award (2002)
- Swarnajayanti Fellowship from the Government of India (2007)
- Fellow of the American Association for the Advancement of Sc…
- Fellow of the Institute for Electrical and Electronic Engine…
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Srinivas Aluru
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup