Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Spencer V. Muse

Spencer V. Muse

· Professor of Statistics Director of Bioinformatics Graduate Program Director of Statistics Undergraduate ProgramVerified

North Carolina State University · Statistics

Active 1992–2025

h-index31
Citations12.6k
Papers493 last 5y
Funding$2.5M
See your match with Spencer V. Muse — sign in to PhdFit.Sign in

Research topics

  • Computer Science
  • Artificial Intelligence
  • Biology
  • Statistics
  • Mathematics
  • Machine Learning
  • Econometrics
  • Evolutionary biology
  • Genetics
  • Economics
  • Paleontology
  • Algorithm
  • Computational biology
  • Mathematical analysis

Selected publications

  • Minus the Error: Testing for Positive Selection in the Presence of Residual Alignment Errors

    eLife · 2025-06-26

    preprintOpen access

    Abstract Positive selection is an evolutionary process which increases the frequency of advantageous mutations because they confer a fitness benefit. Inferring the past action of positive selection on protein-coding sequences is fundamental for deciphering phenotypic diversity and the emergence of novel traits. With the advent of genome-wide comparative genomic datasets, researchers can analyze selection not only at the level of individual genes but also globally, delivering systems-level insights into evolutionary dynamics. However, genome-scale datasets are generated with automated pipelines and imperfect curation that does not eliminate all sequencing, annotation, and alignment errors. Positive selection inference methods are highly sensitive to such errors. We present BUSTED-E: a method designed to detect positive selection for amino acid diversification while concurrently identifying some alignment errors. This method builds on the flexible branch-site random effects model (BUSTED) for fitting distributions of dN/dS, with a critical modification: it incorporates an “error-sink” component to represent an abiological evolutionary regime. Using several genome-scale biological datasets that were extensively filtered using state-of-the art automated alignment tools, we show that BUSTED-E identifies pervasive residual alignment errors, produces more realistic estimates of positive selection, reduces bias, and improves biological interpretation. The BUSTED-E model promises to be a more stringent filter to identify positive selection in genome-wide contexts, thus enabling further characterization and validation of the most biologically relevant cases.

  • Minus the Error: Testing for Positive Selection in the Presence of Residual Alignment Errors

    eLife · 2025-06-26 · 1 citations

    preprintOpen access

    Abstract Positive selection is an evolutionary process which increases the frequency of advantageous mutations because they confer a fitness benefit. Inferring the past action of positive selection on protein-coding sequences is fundamental for deciphering phenotypic diversity and the emergence of novel traits. With the advent of genome-wide comparative genomic datasets, researchers can analyze selection not only at the level of individual genes but also globally, delivering systems-level insights into evolutionary dynamics. However, genome-scale datasets are generated with automated pipelines and imperfect curation that does not eliminate all sequencing, annotation, and alignment errors. Positive selection inference methods are highly sensitive to such errors. We present BUSTED-E: a method designed to detect positive selection for amino acid diversification while concurrently identifying some alignment errors. This method builds on the flexible branch-site random effects model (BUSTED) for fitting distributions of dN/dS, with a critical modification: it incorporates an “error-sink” component to represent an abiological evolutionary regime. Using several genome-scale biological datasets that were extensively filtered using state-of-the art automated alignment tools, we show that BUSTED-E identifies pervasive residual alignment errors, produces more realistic estimates of positive selection, reduces bias, and improves biological interpretation. The BUSTED-E model promises to be a more stringent filter to identify positive selection in genome-wide contexts, thus enabling further characterization and validation of the most biologically relevant cases.

  • Author response: Minus the Error: Testing for Positive Selection in the Presence of Residual Alignment Errors

    2025-06-26

    peer-reviewOpen access

    Positive selection is an evolutionary process which increases the frequency of advantageous mutations because they confer a fitness benefit. Inferring the past action of positive selection on protein-coding sequences is fundamental for deciphering phenotypic diversity and the emergence of novel traits. With the advent of genome-wide comparative genomic datasets, researchers can analyze selection not only at the level of individual genes but also globally, delivering systems-level insights into evolutionary dynamics. However, genome-scale datasets are generated with automated pipelines and imperfect curation that does not eliminate all sequencing, annotation, and alignment errors. Positive selection inference methods are highly sensitive to such errors. We present BUSTED-E: a method designed to detect positive selection for amino acid diversification while concurrently identifying some alignment errors. This method builds on the flexible branch-site random effects model (BUSTED) for fitting distributions of dN/dS, with a critical modification: it incorporates an “error-sink” component to represent an abiological evolutionary regime. Using several genome-scale biological datasets that were extensively filtered using state-of-the art automated alignment tools, we show that BUSTED-E identifies pervasive residual alignment errors, produces more realistic estimates of positive selection, reduces bias, and improves biological interpretation. The BUSTED-E model promises to be a more stringent filter to identify positive selection in genome-wide contexts, thus enabling further characterization and validation of the most biologically relevant cases.

  • Minus the Error: Testing for Positive Selection in the Presence of Residual Alignment Errors

    bioRxiv (Cold Spring Harbor Laboratory) · 2024 · 8 citations

    • Computer Science
    • Artificial Intelligence
    • Statistics

    Positive selection is an evolutionary process which increases the frequency of advantageous mutations because they confer a fitness benefit. Inferring the past action of positive selection on protein-coding sequences is fundamental for deciphering phenotypic diversity and the emergence of novel traits. With the advent of genome-wide comparative genomic datasets, researchers can analyze selection not only at the level of individual genes but also globally, delivering systems-level insights into evolutionary dynamics. However, genome-scale datasets are generated with automated pipelines and imperfect curation that does not eliminate all sequencing, annotation, and alignment errors. Positive selection inference methods are highly sensitive to such errors. We present BUSTED-E: a method designed to detect positive selection for amino acid diversification while concurrently identifying some alignment errors. This method builds on the flexible branch-site random effects model (BUSTED) for fitting distributions of dN/dS, with a critical modification: it incorporates an "error-sink" component to represent an abiological evolutionary regime. Using several genome-scale biological datasets that were extensively filtered using state-of-the art automated alignment tools, we show that BUSTED-E identifies pervasive residual alignment errors, produces more realistic estimates of positive selection, reduces bias, and improves biological interpretation. The BUSTED-E model promises to be a more stringent filter to identify positive selection in genome-wide contexts, thus enabling further characterization and validation of the most biologically relevant cases.

  • Equiprobable discrete models of site-specific substitution rates underestimate the extent of rate variability

    PLoS ONE · 2020 · 3 citations

    Senior authorCorresponding
    • Computer Science
    • Statistics
    • Mathematics

    It is standard practice to model site-to-site variability of substitution rates by discretizing a continuous distribution into a small number, K, of equiprobable rate categories. We demonstrate that the variance of this discretized distribution has an upper bound determined solely by the choice of K and the mean of the distribution. This bound can introduce biases into statistical inference, especially when estimating parameters governing site-to-site variability of substitution rates. Applications to two large collections of sequence alignments demonstrate that this upper bound is often reached in analyses of real data. When parameter estimation is of primary interest, additional rate categories or more flexible modeling methods should be considered.

  • Synonymous Site-to-Site Substitution Rate Variation Dramatically Inflates False Positive Rates of Selection Analyses: Ignore at Your Own Peril

    Molecular Biology and Evolution · 2020 · 103 citations

    Senior authorCorresponding
    • Machine Learning
    • Computer Science
    • Biology

    Most molecular evolutionary studies of natural selection maintain the decades-old assumption that synonymous substitution rate variation (SRV) across sites within genes occurs at levels that are either nonexistent or negligible. However, numerous studies challenge this assumption from a biological perspective and show that SRV is comparable in magnitude to that of nonsynonymous substitution rate variation. We evaluated the impact of this assumption on methods for inferring selection at the molecular level by incorporating SRV into an existing method (BUSTED) for detecting signatures of episodic diversifying selection in genes. Using simulated data we found that failing to account for even moderate levels of SRV in selection testing is likely to produce intolerably high false positive rates. To evaluate the effect of the SRV assumption on actual inferences we compared results of tests with and without the assumption in an empirical analysis of over 13,000 Euteleostomi (bony vertebrate) gene alignments from the Selectome database. This exercise reveals that close to 50% of positive results (i.e., evidence for selection) in empirical analyses disappear when SRV is modeled as part of the statistical analysis and are thus candidates for being false positives. The results from this work add to a growing literature establishing that tests of selection are much more sensitive to certain model assumptions than previously believed.

  • HyPhy 2.5—A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies

    Molecular Biology and Evolution · 2019-08-25 · 761 citations

    articleOpen accessSenior author

    HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.

  • Datamonkey 2.0: A Modern Web Application for Characterizing Selective and Other Evolutionary Processes

    Molecular Biology and Evolution · 2017-12-30 · 1041 citations

    articleOpen access

    Inference of how evolutionary forces have shaped extant genetic diversity is a cornerstone of modern comparative sequence analysis. Advances in sequence generation and increased statistical sophistication of relevant methods now allow researchers to extract ever more evolutionary signal from the data, albeit at an increased computational cost. Here, we announce the release of Datamonkey 2.0, a completely re-engineered version of the Datamonkey web-server for analyzing evolutionary signatures in sequence data. For this endeavor, we leveraged recent developments in open-source libraries that facilitate interactive, robust, and scalable web application development. Datamonkey 2.0 provides a carefully curated collection of methods for interrogating coding-sequence alignments for imprints of natural selection, packaged as a responsive (i.e. can be viewed on tablet and mobile devices), fully interactive, and API-enabled web application. To complement Datamonkey 2.0, we additionally release HyPhy Vision, an accompanying JavaScript application for visualizing analysis results. HyPhy Vision can also be used separately from Datamonkey 2.0 to visualize locally executed HyPhy analyses. Together, Datamonkey 2.0 and HyPhy Vision showcase how scientific software development can benefit from general-purpose open-source frameworks. Datamonkey 2.0 is freely and publicly available at http://www.datamonkey.org, and the underlying codebase is available from https://github.com/veg/datamonkey-js.

  • The Computational Phyloinformatics Summer Course Wikis

    Zenodo (CERN European Organization for Nuclear Research) · 2015-06-18

    articleOpen access

    <em>These are snapshots of the 2008-2012 course wikis for the Computational Phyloinformatics Summer Courses. (There was no course wiki for the 2007 course.)</em> <strong>Computational Phyloinformatics Summer Course</strong> Computational Phyloinformatics is an 10 to 14-day intensive summer workshop established at NESCent, but often co-sponsored and hosted at other institutions. The workshop aims to give biologists practical knowledge and hands-on programming skills in phyloinformatics. The curriculum changes form year to year, but has included PERL (BioPerl, BioPhyo), SQL (BioSQL, TreeBASE), JAVA (JEBL, PAL, Mesquite), R (Ape), HyPhy, and BioRuby. <strong>Synopsis</strong> Biologists are faced with ever-larger datasets, more complex evolutionary models, and increasingly elaborate analytical methods. Seldom is it sufficient to run a dataset with an off-the-shelf program on a desktop PC; increasingly, biologists need to write scripts to interface with internet services and databases, build analytical pipelines, customize analyses, and distribute computation over multiple processors. This course is designed for graduate students, postdocs, and researchers in phylogenetics interested in receiving practical, hands-on training in the use of scripting languages for solving phyloinformatics problems. Students will learn how to write basic phylogenetic or comparative analysis scripts, parse various data files, traverse and compute over trees, and make practical use of phylogenetic software libraries. These skills will be learned in a biological context, touching on a diverse array of topics such as analysis of large datasets, automation of supertree assembly, scripting multiple sequence alignment processing, gene duplication inference, querying for topological patterns in large collections of trees, etc. Participants leave the course with their laptops filled with working software and programming libraries to apply them to their own research projects. Current and Prior Workshops NESCent, Durham, NC, USA, 2007 NESCent, Durham, NC, USA, 2008 Instituto Gulbenkian, Lisbon, Portugal, 2009 BGI-Shenzhen, China, 2010 Kyoto, Japan, 2011 Moscow, Russia, 2012

  • TRAINING THE NEXT GENERATION OF QUANTITATIVE BIOLOGISTS IN THE ERA OF BIG DATA

    2014-11-01 · 5 citations

    article

    The following sections are included: Workshop Focus, Workshop Contributions and References.

Recent grants

Frequent coauthors

  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Spencer V. Muse

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup