Frank DiMaio

· Associate ProfessorVerified

University of Washington · Bioengineering

Active 2003–2026

h-index76

Citations26.4k

Papers264109 last 5y

Funding$4.4M1 active

Faculty page

See your match with Frank DiMaio — sign in to PhdFit.Sign in

About

Frank DiMaio is an Associate Professor in Biochemistry at the University of Washington, specializing in protein structure determination and computational modeling. His research focuses on developing novel computational tools to determine high-resolution protein structures from low-resolution experimental data, drawing on information from previously solved structures. He is interested in improving methods for conformational sampling, modeling of energetics, and identifying missing physical parameters to enhance the accuracy of protein models. His work also includes the prediction and design of symmetric protein assemblies, leveraging the natural abundance of symmetry in biological systems. His research aims to understand the mechanisms behind symmetric protein assembly, improve protein crystallization techniques, and develop symmetric protein assemblies for materials design. DiMaio's contributions are centered on advancing computational approaches to understand and manipulate protein structures and assemblies.

Research topics

Computer Science
Biology
Chemistry
Artificial Intelligence
Biochemistry
Materials science
Computational biology
Nanotechnology
Mathematics
Engineering
Biophysics
Data science
Machine Learning
Biological system
Human–computer interaction
Cell biology
Crystallography
Telecommunications
World Wide Web
Medicine
Systems engineering
Algorithm
Programming language
Software engineering

Selected publications

Using experimental results of protein design to guide biomolecular energy-function development
PLoS Computational Biology · 2026-04-22
articleOpen accessSenior author
Computational models of macromolecules have many applications in biochemistry, but physical inaccuracies limit their utility. One class of models uses energy functions rooted in classical mechanics. The standard datasets used to train these models are limited in diversity, pointing to a need for new training data. Here, we sought to explore a new paradigm for training an energy function, where the Rosetta energy function was used to design de novo proteins. Experimental results on these designs were then used to identify failure modes of design, which were subsequently used as a "guiding principle" to retrain the energy function. Specifically, we examined a diverse set of de novo protein designs experimentally tested for their ability to stably fold, identifying unstable designs that were predicted to be stable by the Rosetta energy function. Using deep mutational scanning, we identified single amino-acid mutations that rescued the stability of these designs, providing insight into common failure modes of the energy function. We identified one key failure mode, involving steric clashing in protein cores. We identified similar overpacking when using Rosetta to refine high-resolution protein crystal structures, quantified the degree of overpacking, and refit a small set of energy-function parameters to better recapitulate native-like packing. Following fitting, we largely eliminated the failure mode in the refinement task, while retaining performance on other benchmarks, resulting in an updated version of the Rosetta energy function. This work shows how learning from protein designs can guide energy-function development.
Publisher DOI
The unique architecture of umbrella toxins permits a two-tiered molecular bet-hedging strategy for interbacterial antagonism
UNC Libraries · 2026-03-03
articleOpen access
Publisher DOI
Perturbing the energy landscape for improved packing during computational protein design
UNC Libraries · 2026-04-14
articleOpen accessSenior author
The FastDesign protocol in the molecular modeling program Rosetta iterates between sequence optimization and structure refinement to stabilize de novo designed protein structures and complexes. FastDesign has been used previously to design novel protein folds and assemblies with important applications in research and medicine. To promote sampling of alternative conformations and sequences, FastDesign includes stages where the energy landscape is smoothened by reducing repulsive forces. Here, we discover that this process disfavors larger amino acids in the protein core because the protein compresses in the early stages of refinement. By testing alternative ramping strategies for the repulsive weight, we arrive at a scheme that produces lower energy designs with more native-like sequence composition in the protein core. We further validate the protocol by designing and experimentally characterizing over 4000 proteins and show that the new protocol produces higher stability proteins.
Publisher DOI
RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design
ArXiv.org · 2026-04-19
articleOpen accessSenior author
We introduce RosettaSearch, an inference-time multi-objective optimization approach for backbone conditioned protein sequence design. We use large language models (LLMs) as a generative optimizer within a search algorithm capable of controlled exploration and exploitation, using rewards computed from RosettaFold3, a structure prediction model, under a strict computational budget. In a large-scale evaluation, we apply RosettaSearch to 400 suboptimal sequences generated by LigandMPNN (a state-of-the-art model trained for protein sequence design), recovering high-fidelity designs that LigandMPNN's single-pass decoding fails to produce. RosettaSearch's designs show improvements in structural fidelity metrics ranging between 18% to 68%, translating to a 2.5x improvement in design success rate. We observe that these gains in success rate are robust when RosettaSearch-designed sequences are evaluated with an independent structure prediction oracle (Chai-1) and generalize across two distinct LLM families (o4-mini and Gemini-3), with performance scaling consistently with reasoning capability. We further demonstrate that RosettaSearch improves the sequence fidelity of ProteinMPNN designs for de novo backbones from the Dayhoff atlas, showing that the approach generalizes beyond native protein structures to computationally generated backbones. We also demonstrate a multi-modal extension of RosettaSearch with vision-language models, where images of predicted protein structures are used as feedback to incorporate structural context to guide protein sequence generation. To our knowledge, this is the first large-scale demonstration that LLMs can serve as effective generative optimizers for backbone-conditioned protein sequence design, yielding systematic gains without any model retraining.
Publisher OA PDF
RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design
arXiv (Cornell University) · 2026-04-19
preprintOpen accessSenior author
We introduce RosettaSearch, an inference-time multi-objective optimization approach for backbone conditioned protein sequence design. We use large language models (LLMs) as a generative optimizer within a search algorithm capable of controlled exploration and exploitation, using rewards computed from RosettaFold3, a structure prediction model, under a strict computational budget. In a large-scale evaluation, we apply RosettaSearch to 400 suboptimal sequences generated by LigandMPNN (a state-of-the-art model trained for protein sequence design), recovering high-fidelity designs that LigandMPNN's single-pass decoding fails to produce. RosettaSearch's designs show improvements in structural fidelity metrics ranging between 18% to 68%, translating to a 2.5x improvement in design success rate. We observe that these gains in success rate are robust when RosettaSearch-designed sequences are evaluated with an independent structure prediction oracle (Chai-1) and generalize across two distinct LLM families (o4-mini and Gemini-3), with performance scaling consistently with reasoning capability. We further demonstrate that RosettaSearch improves the sequence fidelity of ProteinMPNN designs for de novo backbones from the Dayhoff atlas, showing that the approach generalizes beyond native protein structures to computationally generated backbones. We also demonstrate a multi-modal extension of RosettaSearch with vision-language models, where images of predicted protein structures are used as feedback to incorporate structural context to guide protein sequence generation. To our knowledge, this is the first large-scale demonstration that LLMs can serve as effective generative optimizers for backbone-conditioned protein sequence design, yielding systematic gains without any model retraining.
Publisher DOI
Modeling protein–small molecule conformational ensembles with PLACER
Proceedings of the National Academy of Sciences · 2025-11-04 · 7 citations
articleOpen access
Modeling the conformational heterogeneity of protein–small molecule interactions is important for understanding natural systems and evaluating designed systems but remains an outstanding challenge. We reasoned that while residue-level descriptions of biomolecules are efficient for de novo structure prediction, for probing heterogeneity of interactions with small molecules in the folded state, an entirely atomic-level description could have advantages in speed and generality. We developed a graph neural network called PLACER (protein-ligand atomistic conformational ensemble resolver) trained to recapitulate correct atomic positions from partially corrupted input structures from the Cambridge Structural Database and the Protein Data Bank; the nodes of the graph are the atoms in the system. PLACER accurately generates structures of diverse organic small molecules given knowledge of their atom composition and bonding. When given a description of the larger protein context, it builds up structures of small molecules and protein side chains for protein–small molecule docking. Because PLACER is rapid and stochastic, ensembles of predictions can be readily generated to map conformational heterogeneity. In enzyme design efforts described here and elsewhere, we find that using PLACER to assess the accuracy and preorganization of the designed active sites results in higher success rates and higher activities; we obtain a preorganized retroaldolase with a k cat / K M of 11,000 M −1 min −1 , considerably higher than any pre–deep learning design for this reaction. We anticipate that PLACER will be widely useful for rapidly generating conformational ensembles of small molecule and small molecule–protein systems and for designing higher activity preorganized enzymes.
Publisher DOI
<i>De novo</i> design of phosphotyrosine peptide binders
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-30 · 2 citations
preprintOpen access
ABSTRACT Phosphorylation on tyrosine is a key step in many signaling pathways. Despite recent progress in de novo design of protein binders, there are no current methods for designing binders that recognize phosphorylated proteins and peptides; this is a challenging problem as phosphate groups are highly charged, and phosphorylation often occurs within unstructured regions. Here we introduce RoseTTAFold Diffusion 2 for Molecular Interfaces (RFD2-MI), a deep generative framework for the design of binders for protein, ligand, and covalently modified protein targets. We demonstrate the power and versatility of this method by designing binders for four critical phosphotyrosine sites on three clinically relevant targets: Cluster of Differentiation 3 (CD3ε), Epidermal Growth Factor Receptor (EGFR), Insulin Receptor (INSR) and Signal Transducer and Activator of Transcription 5 (STAT5). Experimental characterization shows that the designs bind their phosphotyrosine containing targets with affinities comparable to native binding sites and have negligible binding to non-phosphorylated targets or phosphopeptides with different sequences. X-ray crystal structures of generated binders to CD3ε and EGFR are very close to the design models, demonstrating the accuracy of the design approach. A designed binder to an EGFR intracellular region phosphorylated upon EGF activation co-localizes with the receptor following EGF stimulation in single-particle tracking (SPT) experiments, demonstrating pY specific recognition in living cells. RFD2-MI provides a generalizable all-atom diffusion framework for probing and modulating phosphorylation-dependent signaling, and more generally, for developing research tools and targeted therapeutics against post-translationally modified proteins.
Publisher OA PDF DOI
<i>De novo</i> Design of All-atom Biomolecular Interactions with RFdiffusion3
bioRxiv (Cold Spring Harbor Laboratory) · 2025-09-18 · 29 citations
preprintOpen access
Abstract Deep learning has accelerated protein design, but most existing methods are restricted to generating protein backbone coordinates and often neglect interactions with other biomolecules. We present RFdiffusion3 (RFD3), a diffusion model that generates protein structures in the context of ligands, nucleic acids and other non-protein constellations of atoms. Because all polymer atoms are modeled explicitly, conditioning the model on complex sets of atom-level constraints for enzyme design and other challenges is both simpler and more effective than previous approaches. RFD3 achieves improved performance compared to prior approaches on a range of in silico benchmarks with one tenth the computational cost. Finally, we demonstrate the broad applicability of RFD3 by designing and experimentally characterizing DNA binding proteins and cysteine hydrolases. The ability to rapidly generate protein structures guided by complex sets of atom-level constraints in the context of arbitrary non-protein atoms should further expand the range of functions attainable through protein design.
Publisher OA PDF DOI
Accurate de novo design of high-affinity protein-binding macrocycles using deep learning
JuSER Publikationsportal · 2025-01-01
articleOpen access
Developing macrocyclic binders to therapeutic proteins typically relies on large-scale screening methods that are resource intensive and provide little control over binding mode. Despite progress in protein design, there are currently no robust approaches for de novo design of protein-binding macrocycles. Here we introduce RFpeptides, a denoising diffusion-based pipeline for designing macrocyclic binders against protein targets of interest. We tested 20 or fewer designed macrocycles against each of four diverse proteins and obtained binders with medium to high affinity against all targets. For one of the targets, Rhombotarget A (RbtA), we designed a high-affinity binder (Kd < 10 nM) despite starting from the predicted target structure. X-ray structures for macrocycle-bound myeloid cell leukemia 1, γ-aminobutyric acid type A receptor-associated protein and RbtA complexes match closely with the computational models, with a Cα root-mean-square deviation < 1.5 Å to the design models. RFpeptides provides a framework for rapid and custom design of macrocyclic peptides for diagnostic and therapeutic applications.
Publisher DOI
De novo design of RNA and nucleoprotein complexes
bioRxiv (Cold Spring Harbor Laboratory) · 2025-10-02 · 4 citations
preprintOpen access
Abstract Nucleic acids fold into sequence-dependent tertiary structures and carry out diverse biological functions, much like proteins. However, while considerable advances have been made in the de novo design of protein structure and function, the same has not yet been achieved for RNA tertiary structures of similar intricacy. Here, we describe a generative diffusion framework, RFDpoly , for generalized de novo biopolymer (RNA, DNA and protein) design, and use it to create diverse and designable RNA structures. We design RNA structures with novel folds and experimentally validate them using a combination of chemical footprinting (SHAPE-seq) and electron microscopy. We further use this approach to design protein-nucleic acid assemblies; the crystal structure of one such design is nearly identical to the design model. This work demonstrates that the principles of structure-based de novo protein design can be extended to nucleic acids, opening the door to creating a wide range of new RNA structures and protein-nucleic acid complexes.
Publisher OA PDF DOI

Recent grants

Multimodal Gating Mechanisms of TRPV1 Ion Channels
NIH · $1.8M · 2018–2022
Protein structure determination from low-resolution experimental data
NIH · $2.6M · 2017–2026

Frequent coauthors

David Baker
University of Washington
170 shared
Daniel P. Farrell
University of Washington
49 shared
Paul D. Adams
Joint BioEnergy Institute
29 shared
Hahnbeom Park
Korea Institute of Brain Science
26 shared
David Veesler
University of Washington
25 shared
Minkyung Baek
24 shared
Sergey Ovchinnikov
Harvard University Press
23 shared
Wah Chiu
Stanford University
21 shared

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Frank DiMaio

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you