
Steven Evans
VerifiedUniversity of California, Berkeley · Center for Computational Biology
Active 1969–2025
About
Steven Evans is a Professor of Statistics at the University of California, Berkeley, affiliated with the Center for Computational Biology. His research interests encompass biostatistics and statistics, evolutionary biology, and phylogenetics, with a focus on large random combinatorial structures, random matrices, superprocesses, and other measure-valued processes. He investigates probability on algebraic structures, particularly local fields, and applies stochastic processes to various fields including biodemography, mathematical finance, population genetics, phylogenetics, and historical linguistics. His work involves exploring the mathematical foundations and applications of these complex probabilistic models, contributing to the understanding of evolutionary processes and biological data analysis.
Research topics
- Mathematics
- Combinatorics
- Discrete mathematics
- Statistical physics
- Pure mathematics
Selected publications
Correction to <i> <scp>Addressing Polymorphism in Linguistic Phylogenetics</scp> </i>
Transactions of the Philological Society · 2025-12-14
articleOpen accessAbstract In Canby et al. 2024, TPS 122(2), 199–222, we provided a new model of linguistic evolution that addressed polymorphism, a simulation study comparing several phylogeny estimation methods under this model showing that a maximum parsimony method (MP4) had the best accuracy, and a phylogenetic analysis using maximum parsimony of an updated Ringe & Taylor IE dataset with polymorphic characters included. We recently discovered that the way we estimated phylogenetic trees for the simulation study used an incorrect script for performing binary encodings of multi‐state characters (originally developed for a different question) that did not produce a binary encoding for any monomorphic character. Here, we provide the updated figures and table that have correct values for the methods that employ the binary encoding in the simulation studies we performed.
Addressing Polymorphism in Linguistic Phylogenetics
Transactions of the Philological Society · 2024-04-09 · 4 citations
articleOpen accessAbstract Understanding how languages change is important not only for the reconstruction of protolanguages and for estimating diversification dates (i.e. the dates when languages split), but also for the inference of evolutionary trees (or phylogenetic networks) of language families. We propose a parametric model of language change that addresses lexical polymorphism (two or more words for a given basic meaning) based on what is known about how languages change. Under our model, changes of state in lexical characters occur only due to semantic shift or borrowing, leading to (potentially brief) periods in which polymorphism is present. Across a wide range of model conditions, we find that a simple and natural modification to the maximum parsimony (MP) criterion (which seeks the tree with the fewest number of changes) to allow it to handle polymorphic characters has the best accuracy, substantially improving on well‐known Bayesian methods based on appearances and disappearances of words. We also provide a new analysis of Indo–European that takes polymorphism into account, finding support for a previous tree (Nakhleh et al., 2006) and a new tree that differs from the previous tree in the relationship between Italo‐Celtic and Tocharian.
Lecture notes in computer science · 2024-01-01 · 2 citations
book-chapterSenior authorAdvances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs
bioRxiv (Cold Spring Harbor Laboratory) · 2024-07-23 · 1 citations
preprintOpen accessSenior authorAbstract We address the problem of how to estimate a phylogenetic network when given SNPs (i.e., single nucleotide polymorphisms, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the unrooted topology of any level-1 phylogenetic network N , if we are given a set of SNPs that covers all the bipartitions of N , even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in JCSS 2005 correctly recovers the unrooted topology in polynomial time in this case. To the best of our knowledge, this is the first result to establish that the unrooted topology of a level-1 network is uniquely recoverable from SNPs without known ancestral states. We also present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield’s method) are statistically consistent estimators of the unrooted topology of the level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs the unrooted topology of level-1 networks under the required conditions (all cycles of length at least five), while Gusfield’s algorithm cannot be used in that condition. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.
Progress on Constructing Phylogenetic Networks for Languages
2024-01-01
book-chapterMean-field interacting multi-type birth–death processes with a view to applications in phylodynamics
Theoretical Population Biology · 2024-07-15 · 5 citations
articleAdvances in Estimating Level-1 Phylogenetic Networks from Unrooted SNPs
Journal of Computational Biology · 2024-11-25 · 7 citations
articleSenior authorWe address the problem of how to estimate a phylogenetic network when given single-nucleotide polymorphisms (i.e., SNPs, or bi-allelic markers that have evolved under the infinite sites assumption). We focus on level-1 phylogenetic networks (i.e., networks where the cycles are node-disjoint), since more complex networks are unidentifiable. We provide a polynomial time quartet-based method that we prove correct for reconstructing the semi-directed level-1 phylogenetic network N , if we are given a set of SNPs that covers all the bipartitions of N , even if the ancestral state is not known, provided that the cycles are of length at least 5; we also prove that an algorithm developed by Dan Gusfield in the Journal of Computer and System Sciences in 2005 correctly recovers semi-directed level-1 phylogenetic networks in polynomial time in this case. We present a stochastic model for DNA evolution, and we prove that the two methods (our quartet-based method and Gusfield’s method) are statistically consistent estimators of the semi-directed level-1 phylogenetic network. For the case of multi-state homoplasy-free characters, we prove that our quartet-based method correctly constructs semi-directed level-1 networks under the required conditions (all cycles of length at least five), while Gusfield’s algorithm cannot be used in that case. These results assume that we have access to an oracle for indicating which sites in the DNA alignment are homoplasy-free, and we show that the methods are robust, under some conditions, to oracle errors.
Mean-field interacting multi-type birth-death processes with a view to applications in phylodynamics
arXiv (Cornell University) · 2023-07-12
preprintOpen accessMulti-type birth-death processes underlie approaches for inferring evolutionary dynamics from phylogenetic trees across biological scales, ranging from deep-time species macroevolution to rapid viral evolution and somatic cellular proliferation. A limitation of current phylogenetic birth-death models is that they require restrictive linearity assumptions that yield tractable message-passing likelihoods, but that also preclude interactions between individuals. Many fundamental evolutionary processes -- such as environmental carrying capacity or frequency-dependent selection -- entail interactions, and may strongly influence the dynamics in some systems. Here, we introduce a multi-type birth-death process in mean-field interaction with an ensemble of replicas of the focal process. We prove that, under quite general conditions, the ensemble's stochastically evolving interaction field converges to a deterministic trajectory in the limit of an infinite ensemble. In this limit, the replicas effectively decouple, and self-consistent interactions appear as nonlinearities in the infinitesimal generator of the focal process. We investigate a special case that is rich enough to model both carrying capacity and frequency-dependent selection while yielding tractable message-passing likelihoods in the context of a phylogenetic birth-death model.
Progress on Constructing Phylogenetic Networks for Languages
arXiv (Cornell University) · 2023-06-09
preprintOpen accessIn 2006, Warnow, Evans, Ringe, and Nakhleh proposed a stochastic model (hereafter, the WERN 2006 model) of multi-state linguistic character evolution that allowed for homoplasy and borrowing. They proved that if there is no borrowing between languages and homoplastic states are known in advance, then the phylogenetic tree of a set of languages is statistically identifiable under this model, and they presented statistically consistent methods for estimating these phylogenetic trees. However, they left open the question of whether a phylogenetic network -- which would explicitly model borrowing between languages that are in contact -- can be estimated under the model of character evolution. Here, we establish that under some mild additional constraints on the WERN 2006 model, the phylogenetic network topology is statistically identifiable, and we present algorithms to infer the phylogenetic network. We discuss the ramifications for linguistic phylogenetic network estimation in practice, and suggest directions for future research.
Limit theorems for Fréchet mean sets
Bernoulli · 2023-11-08 · 9 citations
article1st authorCorrespondingFor 1≤p≤∞, the Fréchet p-mean of a probability measure on a metric space is an important notion of central tendency that generalizes the usual notions in the real line of mean (p=2) and median (p=1). In this work we prove a collection of limit theorems for Fréchet means and related objects, which, in general, constitute a sequence of random closed sets. On the one hand, we show that many limit theorems (a strong law of large numbers, an ergodic theorem, and a large deviations principle) can be simply descended from analogous theorems on the space of probability measures via purely topological considerations. On the other hand, we provide the first sufficient conditions for the strong law of large numbers to hold in a T2 topology (in particular, the Fell topology), and we show that this condition is necessary in some special cases. We also discuss statistical and computational implications of the results herein.
Recent grants
Limits via sampling of large discrete and continuous structures
NSF · $300k · 2015–2019
Random matrices, real trees, mortality models, and stepping-stone processes
NSF · $384k · 2004–2009
NSF · $571k · 2009–2016
Frequent coauthors
- 25 shared
Alexandru Hening
Naval Information Warfare Center Pacific
- 18 shared
David Steinsaltz
- 16 shared
Kenneth W. Wachter
University of California, Berkeley
- 14 shared
Edwin Perkins
University of British Columbia
- 13 shared
Anton Wakolbinger
Goethe University Frankfurt
- 11 shared
Tandy Warnow
University of Illinois Urbana-Champaign
- 9 shared
F. A. Matsen
Fred Hutch Cancer Center
- 9 shared
Klaus Fleischmann
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Steven Evans
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup