Resume-aware faculty matching

Find professors who actually fit you

Upload your resume. Four AI agents analyze your background, rank the faculty who fit, inspect their recent research, and help you draft outreach — grounded in their actual work, not templates.

Free to startNo credit cardCancel anytime
Top matches Balanced preset
Dr. Sarah Chen
Stanford · Interpretability · NLP
91
Dr. Marcus Holloway
MIT · Robotics · RL
84
Dr. Aisha Okonkwo
CMU · Fairness · HCI
82
Nova · Professor Researcher · re-ranking top 20…
Brian Hie

Brian Hie

· Assistant Professor of Chemical EngineeringVerified

Stanford University · Chemical Engineering

Active 2016–2026

h-index24
Citations6.8k
Papers6248 last 5y
Funding
See your match with Brian Hie — sign in to PhdFit.Sign in

About

Brian Hie is an Assistant Professor of Chemical Engineering at Stanford University. He is also the Dieter Schwarz Foundation Stanford Data Science Faculty Fellow and an Innovation Investigator at Arc Institute. He supervises the Laboratory of Evolutionary Design, where research is conducted at the intersection of biology and machine learning. His background includes completing a Ph.D. at MIT CSAIL in Electrical Engineering and Computer Science in 2021, and an M.S. in the same field from MIT in 2019. He earned his undergraduate degree from Stanford University in Computer Science in 2016. Prior to his current role, he was a Stanford Science Fellow in the Stanford University School of Medicine and a Visiting Researcher at Meta AI.

Research topics

  • Computer Science
  • Genetics
  • Biology
  • Natural Language Processing
  • Computational biology
  • Artificial Intelligence
  • Machine Learning
  • Virology
  • Programming language
  • Cartography
  • Geography
  • Philosophy
  • Linguistics

Selected publications

  • Genome modelling and design across all domains of life with Evo 2

    Nature · 2026-03-04 · 32 citations

    articleOpen accessSenior author

    All of life encodes information with DNA. Although tools for genome sequencing, synthesis and editing have transformed biological research, we still lack sufficient understanding of the immense complexity encoded by genomes to predict the effects of many classes of genomic changes or to intelligently compose new biological systems. Artificial intelligence models that learn information from genomic sequences across diverse organisms have increasingly advanced prediction and design capabilities1,2. Here we introduce Evo 2, a biological foundation model trained on 9 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life to have a 1 million token context window with single-nucleotide resolution. Evo 2 learns to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific fine-tuning. Mechanistic interpretability analyses reveal that Evo 2 learns representations associated with biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements and prophage genomic regions. The generative abilities of Evo 2 produce mitochondrial, prokaryotic and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Evo 2 also generates experimentally validated chromatin accessibility patterns when guided by predictive models3,4 and inference-time search. We have made Evo 2 fully open, including model parameters, training code5, inference code and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity. Evo 2 is an artificial intelligence-based biological foundation model trained on 9 trillion DNA base pairs spanning all domains of life that predicts functional properties from genomic sequences and provides a rich generative model for researchers in biology.

  • Rapid directed evolution guided by protein language models and epistatic interactions

    Science · 2026-02-19 · 7 citations

    articleOpen access

    Protein engineering is limited by the inefficient search through a high-dimensional sequence space to find combinations of synergistic mutations. Traditional approaches use stepwise mutation stacking, whereas machine learning methods require extensive datasets or multiple experimental rounds and are bottlenecked by costly, length-limited gene synthesis. We present MULTI-evolve (where MULTI stands for model-guided, universal, targeted installation of multimutants), a rapid evolution framework that systematically engineers multimutants. Our approach combines protein language models or existing functional data with epistatic modeling to predict synergistic combinations. Proposed multimutants are built through MULTI-assembly, a mutagenesis method enabling high-efficiency assembly across multikilobase sequences. Applying MULTI-evolve to three proteins achieved up to 10-fold improvements with a single round of machine learning-guided directed evolution. MULTI-evolve provides a streamlined approach for end-to-end, multimutant engineering for a broad range of protein types and functions.

  • Efficient generation of epitope-targeted <i>de novo</i> antibodies with Germinal

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-24 · 20 citations

    preprintOpen accessCorresponding

    Abstract Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.

  • Semantic design of functional de novo genes from a genomic language model

    Nature · 2025-11-19 · 13 citations

    articleOpen accessSenior author

    Abstract Generative genomic models can design increasingly complex biological systems 1 . However, controlling these models to generate novel sequences with desired functions remains challenging. Here, we show that Evo, a genomic language model, can leverage genomic context to perform function-guided design that accesses novel regions of sequence space. By learning semantic relationships across prokaryotic genes 2 , Evo enables a genomic ‘autocomplete’ in which a DNA prompt encoding genomic context for a function of interest guides the generation of novel sequences enriched for related functions, which we refer to as ‘semantic design’. We validate this approach by experimentally testing the activity of generated anti-CRISPR proteins and type II and III toxin–antitoxin systems, including de novo genes with no significant sequence similarity to natural proteins. In-context design of proteins and non-coding RNAs with Evo achieves robust activity and high experimental success rates even in the absence of structural priors, known evolutionary conservation or task-specific fine-tuning. We then use Evo to complete millions of prompts to produce SynGenome, a database containing over 120 billion base pairs of artificial intelligence-generated genomic sequences that enables semantic design across many functions. More broadly, these results demonstrate that generative genomics with biological language models can extend beyond natural sequences.

  • Sidechain conditioning and modeling for full-atom protein sequence design with FAMPNN

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-02-17 · 6 citations

    preprintOpen accessSenior authorCorresponding

    Abstract Leading deep learning-based methods for fixed-backbone protein sequence design do not model protein sidechain conformation during sequence generation despite the large role the three-dimensional arrangement of sidechain atoms play in protein conformation, stability, and overall protein function. Instead, these models implicitly reason about crucial sidechain interactions based on backbone geometry and known amino acid sequence labels. To address this, we present FAMPNN (Full-Atom MPNN), a sequence design method that explicitly models both sequence identity and sidechain conformation for each residue, where the per-token distribution of a residue’s discrete amino acid identity and its continuous sidechain conformation are learned with a combined categorical cross-entropy and diffusion loss objective. We demonstrate that learning these distributions jointly is a highly synergistic task that both improves sequence recovery while achieving state-of-the-art sidechain packing. Furthermore, benefits from explicit full-atom modeling generalize from sequence recovery to practical protein design applications, such as zero-shot prediction of experimental binding and stability measurements.

  • Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale

    ArXiv.org · 2025-02-25 · 3 citations

    articleOpen access

    We introduce convolutional multi-hybrid architectures, with a design grounded on two simple observations. First, operators in hybrid models can be tailored to token manipulation tasks such as in-context recall, multi-token recall, and compression, with input-dependent convolutions and attention offering complementary performance. Second, co-designing convolution operators and hardware-aware algorithms enables efficiency gains in regimes where previous alternative architectures struggle to surpass Transformers. At the 40 billion parameter scale, we train end-to-end 1.2 to 2.9 times faster than optimized Transformers, and 1.1 to 1.4 times faster than previous generation hybrids. On H100 GPUs and model width 4096, individual operators in the proposed multi-hybrid StripedHyena 2 architecture achieve two-fold throughput improvement over linear attention and state-space models. Multi-hybrids excel at sequence modeling over byte-tokenized data, as demonstrated by the Evo 2 line of models. We discuss the foundations that enable these results, including architecture design, overlap-add blocked kernels for tensor cores, and dedicated all-to-all and point-to-point context parallelism strategies.

  • Generative design of novel bacteriophages with genome language models

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-17 · 31 citations

    preprintOpen accessSenior authorCorresponding

    Abstract Many important biological functions arise not from single genes, but from complex interactions encoded by entire genomes. Genome language models have emerged as a promising strategy for designing biological systems, but their ability to generate functional sequences at the scale of whole genomes has remained untested. Here, we report the first generative design of viable bacteriophage genomes. We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism, using the lytic phage ΦX174 as our design template. Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. Cryo-electron microscopy revealed that one of the generated phages utilizes an evolutionarily distant DNA packaging protein within its capsid. Multiple phages demonstrate higher fitness than ΦX174 in growth competitions and in their lysis kinetics. A cocktail of the generated phages rapidly overcomes ΦX174-resistance in three E. coli strains, demonstrating the potential utility of our approach for designing phage therapies against rapidly evolving bacterial pathogens. This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.

  • Genome modeling and design across all domains of life with Evo 2

    bioRxiv (Cold Spring Harbor Laboratory) · 2025-02-21 · 190 citations

    preprintOpen accessSenior authorCorresponding

    Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that Evo 2 autonomously learns a breadth of biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. Beyond its predictive capabilities, Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Guiding Evo 2 via inference-time search enables controllable generation of epigenomic structure, for which we demonstrate the first inference-time scaling results in biology. We make Evo 2 fully open, including model parameters, training code, inference code, and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity.

  • Utilizing Machine Learning to Improve Neutralization Potency of an HIV-1 Antibody Targeting the gp41 N-Heptad Repeat

    ACS Chemical Biology · 2025-06-20 · 1 citations

    articleOpen access

    The N-heptad repeat (NHR) of the HIV-1 gp41 prehairpin intermediate (PHI) is an attractive potential vaccine target with high sequence conservation across diverse strains. However, despite the potency of NHR-targeting peptides and clinical efficacy of the NHR-targeting entry inhibitor enfuvirtide, no potently neutralizing NHR-directed monoclonal antibodies (mAbs) nor antisera have been identified or elicited to date. The lack of potent NHR-binding mAbs both dampens enthusiasm for vaccine development efforts at this target and presents a barrier to performing passive immunization experiments with NHR-targeting antibodies. To address this challenge, we previously developed an improved variant of the NHR-directed mAb D5, called D5_AR, which is capable of neutralizing diverse tier-2 viruses. Building on that work, here we present the 2.7Å-crystal structure of D5_AR bound to NHR mimetic peptide IQN17. We then utilize protein language models and supervised machine learning to generate small (n < 100) libraries of D5_AR variants that are subsequently screened for improved neutralization potency. We identify a variant with 5-fold improved neutralization potency, D5_FI, which is the most potent NHR-directed monoclonal antibody characterized to date and exhibits broad neutralization of tier-2 and −3 pseudoviruses as well as replicating R5 and X4 challenge strains. Additionally, our work highlights the ability of protein language models to efficiently identify improved mAb variants from relatively small libraries.

  • Scanorama: integrating large and diverse single-cell transcriptomic datasets

    Nature Protocols · 2024-06-06 · 38 citations

    reviewOpen access1st authorCorresponding

Frequent coauthors

Education

  • Ph.D., Chemical Engineering

    Stanford University

    2015
  • M.S., Chemical Engineering

    Stanford University

    2011
  • B.S., Chemical Engineering

    University of California, Berkeley

    2009

Awards & honors

  • Dieter Schwarz Foundation Stanford Data Science Faculty Fell…
  • Stanford Science Fellow
  • Resume-aware match score
  • Save to shortlist
  • AI-drafted outreach

See your match with Brian Hie

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

  • Free to start
  • No credit card
  • 30-second signup