Sergei L Kosakovsky Pond

Verified

University of California, San Diego · Software Engineering

Active 2002–2026

h-index22

Citations3.2k

Papers402 last 5y

Funding—

Faculty page

See your match with Sergei L Kosakovsky Pond — sign in to PhdFit.Sign in

About

Sergei L Kosakovsky Pond is an Associate Dean for Research & Innovation at the College of Science & Technology, Temple University. He served as Associate Dean from 1998 to 2003 and has been involved in research related to the modeling evolution of protein coding DNA sequences. His academic background includes a PhD, with his dissertation focusing on modeling the evolution of protein coding DNA sequences under the guidance of advisor Joseph C. Watkins. His work emphasizes the development and application of mathematical and computational methods to understand biological evolution, particularly at the molecular level.

Research topics

Chemistry
Photochemistry
Biology
Materials science
Computational biology

Selected publications

Beyond Invariable Sites: Using Evolutionary Stasis to Map Multi-Layered Constraints on the Evolution of Viral and Mammalian Genomes
bioRxiv (Cold Spring Harbor Laboratory) · 2026-04-10
articleOpen access1st author
The quantification of genomic conservation has progressed from foundational statistical modeling of evolutionary rates to state-of-the-art phylogeny-aware deep learning architectures. Yet, a fundamental resolution gap remains whenever evolutionary rates closely approach the "zero-rate origin," where standard selection inference tools will essentially ignore signals of extreme purifying section at invariant genome sites. We present B-STILL (Bayesian Significance Test of Invariant Low Likelihoods), a hierarchical Bayesian framework designed to resolve the selective landscape of protein-coding data by leveraging gene-level calibration and codon-site specific evolutionary opportunity. This framework is based on computationally efficient approximations using codon-substitution models which are scalable to alignments with thousands of sequences. By explicitly tuning the stasis radius around the near-zero evolutionary-rate regime, B-STILL distinguishes between stochastic invariance and functional constraint, identifying Evolutionary Stasis Anchors (ESAs) where the upper bound on permitted evolutionary change is statistically anomalous relative to the background of the gene. This hierarchical approach provides a signature of functional or structural constraint that is often difficult to detect using other tools. Validation against extensive pathogen and clinical databases confirms that ESAs are predictors of biological fitness and disease potential. Collectively, we identified thousands of significantly clustered ESAs that precisely footprint both known functional domains and currently uncharacterized structural motifs in mammalian and viral genomes. These findings establish B-STILL as a scalable statistical framework for high-resolution genomic annotation, transforming formerly ignored invariant genome and protein sites into informative markers of extreme purifying selection across both well-characterized and uncharacterized protein-coding genes from different domains of life.
Publisher OA PDF DOI
Changing the Optics: Comparing Traditional and Retrieval-Augmented GenAI E-Tutorials in Interdisciplinary Learning
Open MIND · 2026-02-24
preprint
Understanding information-seeking behaviors in e-learning is critical, as learners must often make sense of complex and fragmented information, a challenge compounded in interdisciplinary fields with diverse prior knowledge. Compared to traditional e-tutorials, GenAI e-tutorials offer new ways to navigate information spaces, yet how they shape learners information-seeking behaviors remains unclear. To address this gap, we characterized behavioral differences between traditional and GenAI-mediated e-tutorial learning using the three search modes of orienteering. We conducted a between-subject study in which learners engaged with either a traditional e-tutorial or a GenAI e-tutorial accessing the same underlying information content. We found that the traditional users maintained greater awareness and focus of the information space, whereas GenAI users exhibited more proactive and exploratory behaviors with lower cognitive load due to the querying-driven interaction. These findings offer guidance for designing tutorials in e-learning.
DOI
Dynamics of natural selection preceding human viral epidemics and pandemics
Cell · 2026-03-06 · 2 citations
articleOpen access
Using a phylogenetic framework to characterize natural selection, we investigate the hypothesis that zoonotic viruses require adaptation prior to zoonosis to sustain human-to-human transmission. Examining the zoonotic emergence of Ebola virus, Marburg virus, mpox virus, influenza A virus, and SARS-CoV-2, we find no evidence of a change in selection intensity immediately prior to outbreaks in humans compared with typical selection within reservoir hosts. We found a change in selection on SARS-CoV in an intermediate host. We conclude that extensive pre-zoonotic adaptation is not necessary for human-to-human transmission of zoonotic viruses. In contrast, the reemergence of H1N1 influenza A virus in 1977 was preceded by a shift in selection intensity, consistent with the hypothesis of passage in a laboratory setting. Holistic phylogenetic analysis of selection regimes can be used to detect evolutionary signals of host switching or laboratory passage, providing insight into the circumstances of past and future viral emergence.
Publisher DOI
Changing the Optics: Comparing Traditional and Retrieval-Augmented GenAI E-Tutorials in Interdisciplinary Learning
arXiv (Cornell University) · 2026-02-24
articleOpen access
Understanding information-seeking behaviors in e-learning is critical, as learners must often make sense of complex and fragmented information, a challenge compounded in interdisciplinary fields with diverse prior knowledge. Compared to traditional e-tutorials, GenAI e-tutorials offer new ways to navigate information spaces, yet how they shape learners information-seeking behaviors remains unclear. To address this gap, we characterized behavioral differences between traditional and GenAI-mediated e-tutorial learning using the three search modes of orienteering. We conducted a between-subject study in which learners engaged with either a traditional e-tutorial or a GenAI e-tutorial accessing the same underlying information content. We found that the traditional users maintained greater awareness and focus of the information space, whereas GenAI users exhibited more proactive and exploratory behaviors with lower cognitive load due to the querying-driven interaction. These findings offer guidance for designing tutorials in e-learning.
Publisher OA PDF
Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
eLife · 2025-08-28
articleOpen access
Most phylogenetic trees are inferred using time-reversible evolutionary models that assume that the relative rates of substitution for any given pair of nucleotides are the same regardless of the direction of the substitutions. However, there is no reason to assume that the underlying biochemical mutational processes that cause substitutions are similarly symmetrical. We consider two non-reversible nucleotide substitution models: (1) a 6-rate non-reversible model (NREV6) that is applicable to analysing mutational processes in double-stranded genomes, in that complementary substitutions occur at identical rates and (2) a 12-rate non-reversible model (NREV12) that is applicable to analysing mutational processes in single-stranded (ss) genomes, in that all substitution types are free to occur at different rates. Using likelihood ratio and Akaike information criterion-based model tests, we show that, surprisingly, NREV12 provided a significantly better fit than the general time reversible (GTR) and NREV6 models to 21/31 dsRNA and 20/30 dsDNA datasets. As expected, however, NREV12 provided a significantly better fit to 24/33 ssDNA and 40/47 ssRNA datasets. We tested how non-reversibility impacts the accuracy with which phylogenetic trees are inferred. As simulated degrees of non-reversibility (DNRs) increased, the tree topology inferences using both NREV12 and GTR became more accurate, whereas inferred tree branch lengths became less accurate. We conclude that while non-reversible models should be helpful in the analysis of mutational processes in most virus species, there is no pressing need to use these models for routine phylogenetic inference.
Publisher DOI
A New Comparative Framework for Estimating Selection on Synonymous Substitutions
Molecular Biology and Evolution · 2025-03-23
articleOpen accessSenior author
Selection on synonymous codon usage is a well-known and widespread phenomenon, yet existing models often do not account for it or its effect on synonymous substitution rates. In this article, we develop and expand the capabilities of multiclass synonymous substitution (MSS) models, which account for such selection by partitioning synonymous substitutions into 2 or more classes and estimating a relative substitution rate for each class, while accounting for important confounders like mutation bias. We identify extensive heterogeneity among relative synonymous substitution rates in an empirical dataset of ∼12,000 gene alignments from 12 Drosophila species. We validate model performance using data simulated under a forward population genetic simulation, demonstrating that MSS models are robust to model misspecification. MSS rates are significantly correlated with other covariates of selection on codon usage (population-level polymorphism data and tRNA abundance data), suggesting that models can detect weak signatures of selection on codon usage. With the MSS model, we can now study selection on synonymous substitutions in diverse taxa, independent of any a priori assumptions about the forces driving that selection.
Publisher OA PDF DOI
Minus the Error: Testing for Positive Selection in the Presence of Residual Alignment Errors
eLife · 2025-06-26 · 1 citations
preprintOpen accessSenior author
Abstract Positive selection is an evolutionary process which increases the frequency of advantageous mutations because they confer a fitness benefit. Inferring the past action of positive selection on protein-coding sequences is fundamental for deciphering phenotypic diversity and the emergence of novel traits. With the advent of genome-wide comparative genomic datasets, researchers can analyze selection not only at the level of individual genes but also globally, delivering systems-level insights into evolutionary dynamics. However, genome-scale datasets are generated with automated pipelines and imperfect curation that does not eliminate all sequencing, annotation, and alignment errors. Positive selection inference methods are highly sensitive to such errors. We present BUSTED-E: a method designed to detect positive selection for amino acid diversification while concurrently identifying some alignment errors. This method builds on the flexible branch-site random effects model (BUSTED) for fitting distributions of dN/dS, with a critical modification: it incorporates an “error-sink” component to represent an abiological evolutionary regime. Using several genome-scale biological datasets that were extensively filtered using state-of-the art automated alignment tools, we show that BUSTED-E identifies pervasive residual alignment errors, produces more realistic estimates of positive selection, reduces bias, and improves biological interpretation. The BUSTED-E model promises to be a more stringent filter to identify positive selection in genome-wide contexts, thus enabling further characterization and validation of the most biologically relevant cases.
Publisher DOI
MoleRate: comparing molecular relative evolutionary rates to detect convergent evolution
Evolution · 2025-12-11
articleOpen accessSenior author
In comparative evolutionary genomics, faster or slower evolution of a particular gene, site, or branch in a phylogenetic tree, when compared to the appropriate average, has been interpreted as evidence of conservation, functional importance, or adaptation. With large consortia generating hundreds of genomes, there is an opportunity to interrogate these datasets for evidence of accelerated or reduced evolutionary rates in protein-coding genes associated with the presence or absence of a given phenotype (e.g., marine vs. terrestrial, nocturnal vs. diurnal). Such rate shifts can reflect the molecular basis of convergent phenotypic adaptation when they occur repeatedly across independent lineages. Here, we introduce an explicit phylogenetic rate test, MoleRate, for acceleration or reduction of nucleotide or protein evolutionary rates in focal lineages vs. the rest of the phylogeny. Compared to existing methods, MoleRate offers execution, explicit likelihood-based hypothesis testing, and the ability to detect and filter out potentially aberrant signal from single lineages. We demonstrate MoleRate's performance on simulated and empirical data, and apply it to several mammalian phenotypes. We also highlight its visualization capabilities, which enable exploration and communication of results. These analyses show that MoleRate detects biologically significant enrichments in selective pressure on specific functions related to the given phenotype, and that enrichments in selective pressure related to the given phenotype, absent when random lineages are tested.
Publisher OA PDF DOI
Author response: Viral genome sequence datasets display pervasive evidence of strand-specific substitution biases that are best described using non-reversible nucleotide substitution models
2025-09-30
peer-reviewOpen access
Publisher DOI
HIV-1 Rebound Virus Consists of a Small Number of Lineages That Entered the Reservoir Close to ART Initiation
bioRxiv (Cold Spring Harbor Laboratory) · 2025-01-31 · 3 citations
preprintOpen access
Abstract HIV-1 persists as a latent reservoir during suppressive antiretroviral therapy (ART). Viral rebound occurs upon ART interruption, posing a challenge to cure efforts. Characterizing viral populations fuelling rebound is imperative to curing HIV-1. We used longitudinal samples collected pretherapy from women in the CAPRISA 002 cohort to create an evolutionary time- line to determine the pretherapy timepoint when the rebound virus originally entered the long- lived reservoir. Participants (N=10) were untreated for an average of 5 years then on ART for an average of 2 years before viral rebound (defined as >1000 RNA copies/ml). env sequences were used to characterize the longitudinal pre-ART evolving viral RNA population, the proviral DNA reservoir during ART, and viral RNA in the plasma during rebound. For each participant, between 1 and 3 major viral lineages were identified in the plasma during rebound. A total of 20 rebound virus lineages were examined for the 10 participants, and 19 were found to have entered the reservoir around the time of therapy initiation. The one lineage estimated to enter the reservoir more than a year before therapy was observed in a participant who was untreated for more than 8 years, yet retained moderate CD4 T cell counts. Analysis of the viral DNA reservoir, from which the rebound viruses emanated, revealed that while 95% of rebounding lineages dated to the year before ART initiation, only 61% of unique proviruses dated to that time period. Strikingly, for three participants with DNA reservoirs dominated by viruses from earlier in untreated infection, only 33% of unique proviruses dated to the year before ART initiation, yet 83% of rebounding lineages dated to that time. Our results show that rebound virus almost exclusively comes from the portion of the latent reservoir that formed around the time of therapy initiation, even when the reservoir is composed of diverse sequences from across the pre-ART time period. Author Summary HIV-1 is maintained in a long-lived reservoir during suppressive therapy. Virus rebounds if therapy is discontinued. We found that in most cases rebound virus comes from a pool of viral sequences that entered the long-lived reservoir around the time of therapy initiation. While the viral DNA reservoir is on average also skewed toward sequences replicating around the time of therapy initiation, the rebound virus almost exclusively comes from this portion of the latent reservoir, even when the reservoir contained proviruses from much earlier in untreated infection. Thus, we hypothesize that there are features of the viruses forming the latent reservoir around the time of therapy initiation, or features of the host at that time, that select these viruses as initiators of rebound during therapy discontinuation.
Publisher DOI

Frequent coauthors

Joseph W. Perry
46 shared
Mariacristina Rumi
United States Air Force Research Laboratory
38 shared
Seth R. Marder
26 shared
Glenn P. Bartholomew
16 shared
Jean‐Luc Brédas
University of Arizona
16 shared
Guillermo C. Bazan
National University of Singapore
16 shared
Sergei Tretiak
Los Alamos National Laboratory
16 shared
Timothy C. Parker
Georgia Institute of Technology
11 shared

Education

PhD, Applied Mathematics
University of Arizona
2003

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Sergei L Kosakovsky Pond

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you