Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Sorin Istrail

Sorin Istrail

· James A. & Julie N. Brown Professor of Computational and Mathematical Sciences

Brown University · Computer Science

Active 1978–2022

h-index54
Citations27.8k
Papers23810 last 5y
Funding$1.5M
See your match with Sorin Istrail — sign in to PhdFit.Sign in

About

Sorin Istrail is the James A. and Julie N. Brown Professor of Computational and Mathematical Sciences and Professor of Computer Science at Brown University. He is associated with the Istrail Laboratory, a research group focused on computational biology and computer science within the Department of Computer Science at Brown University. His work involves research in genomics, including the sequencing of the human genome, whole genome shotgun assembly and comparison of human genome assemblies, immune peptidomics of humans and their pathogens, and the genomic study of the sea urchin, including its genome and transcriptome. His research emphasizes understanding the logic functions of the genomic cis-regulatory code, contributing to the broader field of computational molecular biology.

Research topics

  • Artificial Intelligence
  • Computer Science
  • Data Mining
  • Engineering
  • Genetics
  • Bioinformatics
  • Biology
  • Data science
  • Computational biology
  • Theoretical computer science
  • Management science

Selected publications

  • Special Issue: Professor Michael Waterman's 80th Birthday, Part 1

    Journal of Computational Biology · 2022-06-15

    article1st author
  • Michael Waterman's Contributions to Computational Biology and Bioinformatics

    Journal of Computational Biology · 2022-06-21 · 2 citations

    reviewSenior author

    On the occasion of Dr. Michael Waterman's 80th birthday, we review his major contributions to the field of computational biology and bioinformatics including the famous Smith-Waterman algorithm for sequence alignment, the probability and statistics theory related to sequence alignment, algorithms for sequence assembly, the Lander-Waterman model for genome physical mapping, combinatorics and predictions of ribonucleic acid structures, word counting statistics in molecular sequences, alignment-free sequence comparison, and algorithms for haplotype block partition and tagSNP selection related to the International HapMap Project. His books Introduction to Computational Biology: Maps, Sequences and Genomes for graduate students and Computational Genome Analysis: An Introduction geared toward undergraduate students played key roles in computational biology and bioinformatics education. We also highlight his efforts of building the computational biology and bioinformatics community as the founding editor of the Journal of Computational Biology and a founding member of the International Conference on Research in Computational Molecular Biology (RECOMB).

  • Computational Advances in Bio and Medical Sciences

    Lecture notes in computer science · 2021 · 2 citations

    1st authorCorresponding
    • Computer Science
    • Computer Science
    • Artificial Intelligence
  • Combinatorial and statistical prediction of gene expression from haplotype sequence

    Bioinformatics · 2020-05-01 · 2 citations

    articleOpen access

    MOTIVATION: Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. RESULTS: In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. AVAILABILITY AND IMPLEMENTATION: Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • Proteinarium: Multi-sample protein-protein interaction analysis and visualization tool

    Genomics · 2020 · 18 citations

    • Data Mining
    • Computer Science
    • Biology

    We posit the likely architecture of complex diseases is that subgroups of patients share variants in genes in specific networks which are sufficient to give rise to a shared phenotype. We developed Proteinarium, a multi-sample protein-protein interaction (PPI) tool, to identify clusters of patients with shared gene networks. Proteinarium converts user defined seed genes to protein symbols and maps them onto the STRING interactome. A PPI network is built for each sample using Dijkstra's algorithm. Pairwise similarity scores are calculated to compare the networks and cluster the samples. A layered graph of PPI networks for the samples in any cluster can be visualized. To test this newly developed analysis pipeline, we reanalyzed publicly available data sets, from which modest outcomes had previously been achieved. We found significant clusters of patients with unique genes which enhanced the findings in the original study.

  • Invariant Patterns in Crystal Lattices: Implications for Protein Folding Algorithms

    TUGraz OPEN Library (Graz University of Technology) · 2020-04-07 · 1 citations

    articleOpen access1st authorCorresponding
  • Preface Special Issue: RECOMB 2018

    Journal of Computational Biology · 2020-03-01

    article1st authorCorresponding
  • Proteinarium: Multi-Sample Protein-Protein Interaction Analysis and Visualization Tool

    bioRxiv (Cold Spring Harbor Laboratory) · 2019-03-26 · 2 citations

    preprintOpen access

    Abstract Background Data analysis has become crucial in the post genomic era where the accumulation of genomic information is mounting exponentially. Analyzing protein-protein interactions in the context of the interactome is a powerful approach to understanding disease phenotypes. Results We describe Proteinarium, a multi-sample protein-protein interaction network analysis and visualization tool. Proteinarium can be used to analyze data for samples with dichotomous phenotypes, multiple samples from a single phenotype or a single sample. Then, by similarity clustering, the network-based relations of samples are identified and clusters of related samples are presented as a dendrogram. Each branch of the dendrogram is built based on network similarities of the samples. The protein-protein interaction networks can be analyzed and visualized on any branch of the dendrogram. Proteinarium’s input can be derived from transcriptome analysis, whole exome sequencing data or any high-throughput screening approach. Its strength lies in use of gene lists for each sample as a distinct input which are further analyzed through protein interaction analyses. Proteinarium output includes the gene lists of visualized networks and PPI interaction files where users can analyze the network(s) on other platforms such as Cytoscape. In addition, since the dendrogram is written in Newick tree format, users can visualize it in other software platforms like Dendroscope, ITOL. Conclusions Proteinarium, through the analysis and visualization of PPI networks, allows researchers to make important observations on high throughput data for a variety of research questions. Proteinarium identifies significant clusters of patients based on their shared network similarity for the disease of interest and the associated genes. Proteinarium is a command-line tool written in Java with no external dependencies and it is freely available at https://github.com/Armanious/Proteinarium .

  • How Does the Regulatory Genome Work?

    Journal of Computational Biology · 2019-06-05 · 7 citations

    articleOpen access1st author

    Abstract The regulatory genome controls genome activity throughout the life of an organism. This requires that complex information processing functions are encoded in, and operated by, the regulatory genome. Although much remains to be learned about how the regulatory genome works, we here discuss two cases where regulatory functions have been experimentally dissected in great detail and at the systems level, and formalized by computational logic models. Both examples derive from the sea urchin embryo, but assess two distinct organizational levels of genomic information processing. The first example shows how the regulatory system of a single gene, endo16 , executes logic operations through individual transcription factor binding sites and cis -regulatory modules that control the expression of this gene. The second example shows information processing at the gene regulatory network (GRN) level. The GRN controlling development of the sea urchin endomesoderm has been experimentally explored at an almost complete level. A Boolean logic model of this GRN suggests that the modular logic functions encoded at the single-gene level show compositionality and suffice to account for integrated function at the network level. We discuss these examples both from a biological-experimental point of view and from a computer science-informational point of view, as both illuminate principles of how the regulatory genome works.

  • Eric Davidson's Regulatory Genome for Computer Science: Causality, Logic, and Proof Principles of the Genomic <i>cis</i> -Regulatory Code

    Journal of Computational Biology · 2019-07-01 · 6 citations

    articleOpen access1st authorCorresponding

    I think that it is a relatively good approximation to truth which is much too complicated to allow anything but approximations that mathematical ideas originate in empirics. But, once they are conceived, the subject begins to live a peculiar life of its own and is governed by almost entirely aesthetical motivations. In other words, at a great distance from its empirical source, or after much ''abstract'' inbreeding, a mathematical subject is in danger of degeneration. Whenever this stage is reached the only remedy seems to me to be the rejuvenating return to the source: the reinjection of more or less directly empirical ideas.-John von Neumann (1947).

Recent grants

Frequent coauthors

  • Pavel A. Pevzner

    University of California, San Diego

    223 shared
  • Michael S. Waterman

    219 shared
  • Roberto Tagliaferri

    98 shared
  • Waraporn Tongprasit

    75 shared
  • Manoj P. Samanta

    74 shared
  • Viktor Štolc

    Ames Research Center

    74 shared
  • Eric H. Davidson

    65 shared
  • Marie-France Sagot

    Institut national de recherche en informatique et en automatique

    64 shared
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Sorin Istrail

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup