Tommi Jaakkola

Verified

Massachusetts Institute of Technology · Electrical Engineering & Computer Science

Active 1994–2026

h-index92

Citations40.4k

Papers475155 last 5y

Funding—

Faculty page

See your match with Tommi Jaakkola — sign in to PhdFit.Sign in

About

Tommi Jaakkola is the Thomas M. Siebel Distinguished Professor at MIT, specializing in Artificial Intelligence and Decision-making within the Department of Electrical Engineering and Computer Science. His research areas include AI for Healthcare and Life Sciences, Artificial Intelligence and Machine Learning, and Natural Language and Speech Processing. He focuses on developing techniques for the analysis and synthesis of systems that interact with the external world through perception, communication, and action, while also learning, making decisions, and adapting to changing environments. His work involves leveraging computational, theoretical, and experimental tools to advance groundbreaking sensors, energy transducers, physical substrates for computation, and systems addressing shared human challenges.

Research topics

Artificial Intelligence
Computer Science
Machine Learning
Computational biology
Biology
Chemistry
Genetics
Computer Security
Data Mining
Psychology
Cognitive science
Nanotechnology
Theoretical computer science
Bioinformatics
Evolutionary biology
Biochemistry
Materials science
Engineering
Microbiology
Data science
Mathematics
Physics

Selected publications

AI-based methods for simulating, sampling, and predicting protein ensembles
Current Opinion in Structural Biology · 2026-04-02
articleSenior author
Publisher DOI
Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation
Open MIND · 2026-02-16
preprintSenior author
Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusion or flow model on the canonical slice, and finally recover the invariant distribution by sampling a random symmetry transform at generation time. Building on a formal quotient-space perspective, our work provides a comprehensive theory of canonical diffusion by proving: (i) the correctness, universality and superior expressivity of canonical generative models over invariant targets; (ii) canonicalization accelerates training by removing diffusion score complexity induced by group mixtures and reducing conditional variance in flow matching. We then show that aligned priors and optimal transport act complementarily with canonicalization and further improves training efficiency. We instantiate the framework for molecular graph generation under $S_n \times SE(3)$ symmetries. By leveraging geometric spectra-based canonicalization and mild positional encodings, canonical diffusion significantly outperforms equivariant baselines in 3D molecule generation tasks, with similar or even less computation. Moreover, with a novel architecture Canon, CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
DOI
Rethinking Diffusion Models with Symmetries through Canonicalization with Applications to Molecular Graph Generation
ArXiv.org · 2026-02-16
articleOpen accessSenior author
Many generative tasks in chemistry and science involve distributions invariant to group symmetries (e.g., permutation and rotation). A common strategy enforces invariance and equivariance through architectural constraints such as equivariant denoisers and invariant priors. In this paper, we challenge this tradition through the alternative canonicalization perspective: first map each sample to an orbit representative with a canonical pose or order, train an unconstrained (non-equivariant) diffusion or flow model on the canonical slice, and finally recover the invariant distribution by sampling a random symmetry transform at generation time. Building on a formal quotient-space perspective, our work provides a comprehensive theory of canonical diffusion by proving: (i) the correctness, universality and superior expressivity of canonical generative models over invariant targets; (ii) canonicalization accelerates training by removing diffusion score complexity induced by group mixtures and reducing conditional variance in flow matching. We then show that aligned priors and optimal transport act complementarily with canonicalization and further improves training efficiency. We instantiate the framework for molecular graph generation under $S_n \times SE(3)$ symmetries. By leveraging geometric spectra-based canonicalization and mild positional encodings, canonical diffusion significantly outperforms equivariant baselines in 3D molecule generation tasks, with similar or even less computation. Moreover, with a novel architecture Canon, CanonFlow achieves state-of-the-art performance on the challenging GEOM-DRUG dataset, and the advantage remains large in few-step generation.
Publisher OA PDF
PackFlow: Generative Molecular Crystal Structure Prediction via Reinforcement Learning Alignment
ArXiv.org · 2026-01-01
articleOpen access
Organic molecular crystals underpin technologies ranging from pharmaceuticals to organic electronics, yet predicting solid-state packing of molecules remains challenging because candidate generation is combinatorial and stability is only resolved after costly energy evaluations. Here we introduce PackFlow, a flow matching framework for molecular crystal structure prediction (CSP) that generates heavy-atom crystal proposals by jointly sampling Cartesian coordinates and unit-cell lattice parameters given a molecular graph. This lattice-aware generation interfaces directly with downstream relaxation and lattice-energy ranking, positioning PackFlow as a scalable proposal engine within standard CSP pipelines. To explicitly steer generation toward physically favourable regions, we propose physics alignment, a reinforcement learning post-training stage that uses machine-learned interatomic potential energies and forces as stability proxies. Physics alignment improves physical validity without altering inference-time sampling. We validate PackFlow's performance against heuristic baselines through two distinct evaluations. First, on a broad unseen set of molecular systems, we demonstrate superior candidate generation capability, with proposals exhibiting greater structural similarity to experimental polymorphs. Second, we assess the full end-to-end workflow on two unseen CSP blind-test case studies, including relaxation and lattice-energy analysis. In both settings, PackFlow outperforms heuristics-based methods by concentrating probability mass in low-energy basins, yielding candidates that relax into lower-energy minima and offering a practical route to amortize the relax-and-rank bottleneck.
Publisher OA PDF
FragmentFlow: Scalable Transition State Generation for Large Molecules
Open MIND · 2026-02-02
preprintSenior author
Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms. Although traditional TS generation methods are computationally expensive, recent generative modeling approaches have enabled chemically meaningful TS prediction for relatively small molecules. However, these methods fail to generalize to practically relevant reaction substrates because of distribution shifts induced by increasing molecular sizes. Furthermore, TS geometries for larger molecules are not available at scale, making it infeasible to train generative models from scratch on such molecules. To address these challenges, we introduce FragmentFlow: a divide-and-conquer approach that trains a generative model to predict TS geometries for the reactive core atoms, which define the reaction mechanism. The full TS structure is then reconstructed by re-attaching substituent fragments to the predicted core. By operating on reactive cores, whose size and composition remain relatively invariant across molecular contexts, FragmentFlow mitigates distribution shifts in generative modeling. Evaluated on a new curated dataset of reactions involving reactants with up to 33 heavy atoms, FragmentFlow correctly identifies 90% of TSs while requiring 30% fewer saddle-point optimization steps than classical initialization schemes. These results point toward scalable TS generation for high-throughput reactivity studies.
DOI
Protein FID: improved evaluation of protein structure generative models
Bioinformatics · 2026-04-01
articleOpen access
MOTIVATION: Protein structure generative models have seen a recent surge of interest, but meaningfully evaluating them computationally is an active area of research. While current metrics have driven useful progress, they do not capture how well models sample the design space represented by the training data. We argue for a protein Frechet Inception Distance (FID) metric to supplement current evaluations with a measure of distributional similarity in a semantically meaningful latent space. RESULTS: Our FID behaves desirably under protein structure perturbations and correctly recapitulates similarities between protein samples: it correlates with optimal transport distances and recovers FoldSeek clusters and the CATH hierarchy. Evaluating current protein structure generative models with FID shows that they fall short of modeling the distribution of PDB proteins. AVAILABILITY: Code is available at: https://github.com/ffaltings/protfid.
Publisher DOI
Turbulence teaches equivariance to neural networks
arXiv (Cornell University) · 2026-02-04
articleOpen access
We investigate how the rotational nature of turbulence affects learned mappings between quantities governed by the Navier-Stokes equations. By varying the degree of anisotropy in a turbulence dataset, we explore how statistical symmetry affects these mappings. To do this, we train super-resolution models at different wall-normal locations in a channel flow, where anisotropy varies naturally, and test their generalization. By evaluating the learned mappings on new coordinate frames and new flow conditions, we find that coordinate-frame generalization is a key part of the generalization problem. Turbulent flows naturally present a wide range of local orientations, so respecting the symmetries of the Navier-Stokes equations improves generalization to new flows. Importantly, turbulence's rotational structure can embed these symmetries into learned mappings -- an effect that strengthens with isotropy and dataset size. This is because a more isotropic dataset samples a wider range of orientations, more fully covering the rotational symmetries of the Navier-Stokes equations. The dependence on isotropy means equivariance error is also scale-dependent, consistent with Kolmogorov's hypothesis. Therefore, turbulence provides its own data augmentation (we term this implicit data augmentation). We expect this effect to apply broadly to learned mappings between tensorial flow quantities, making it relevant to most machine learning applications in turbulence.
Publisher OA PDF
FragmentFlow: Scalable Transition State Generation for Large Molecules
ArXiv.org · 2026-02-02
articleOpen accessSenior author
Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms. Although traditional TS generation methods are computationally expensive, recent generative modeling approaches have enabled chemically meaningful TS prediction for relatively small molecules. However, these methods fail to generalize to practically relevant reaction substrates because of distribution shifts induced by increasing molecular sizes. Furthermore, TS geometries for larger molecules are not available at scale, making it infeasible to train generative models from scratch on such molecules. To address these challenges, we introduce FragmentFlow: a divide-and-conquer approach that trains a generative model to predict TS geometries for the reactive core atoms, which define the reaction mechanism. The full TS structure is then reconstructed by re-attaching substituent fragments to the predicted core. By operating on reactive cores, whose size and composition remain relatively invariant across molecular contexts, FragmentFlow mitigates distribution shifts in generative modeling. Evaluated on a new curated dataset of reactions involving reactants with up to 33 heavy atoms, FragmentFlow correctly identifies 90% of TSs while requiring 30% fewer saddle-point optimization steps than classical initialization schemes. These results point toward scalable TS generation for high-throughput reactivity studies.
Publisher OA PDF
Zatom-1: Towards a Multimodal Foundation Model for 3D Molecules and Materials
arXiv (Cornell University) · 2026-02-24
preprintOpen access
General-purpose 3D modeling in chemistry encompasses molecules and materials, requiring both generative and predictive capabilities. However, most existing AI approaches are optimized for a single domain (molecules or materials) and a single task (generation or prediction), which limits representation sharing and transfer. We introduce Zatom-1, a cross-domain, general-purpose model architecture that unifies generative and predictive learning of 3D molecules and materials. Zatom-1 is a deliberately simplified Transformer trained with a multimodal flow matching objective that jointly models discrete atom types and continuous 3D geometries. This approach supports scalable pretraining with predictable gains as model capacity increases, while enabling fast and stable sampling. We use cross-domain generative pretraining as a universal initialization for downstream multi-task prediction of properties, energies, and forces. Empirically, Zatom-1 outperforms or competes with specialized baselines on both multi-task generative and predictive benchmarks in data-controlled settings, while improving generative inference speed by more than an order of magnitude. Our experiments demonstrate positive predictive transfer between data domains from joint generative pretraining: modeling materials during generative pretraining improves molecular property prediction accuracy. Open-source code and model weights are freely available at https://github.com/Zatom-AI/zatom.
Publisher DOI
Turbulence teaches equivariance to neural networks
Open MIND · 2026-02-04
preprint
We investigate how the rotational nature of turbulence affects learned mappings between quantities governed by the Navier-Stokes equations. By varying the degree of anisotropy in a turbulence dataset, we explore how statistical symmetry affects these mappings. To do this, we train super-resolution models at different wall-normal locations in a channel flow, where anisotropy varies naturally, and test their generalization. By evaluating the learned mappings on new coordinate frames and new flow conditions, we find that coordinate-frame generalization is a key part of the generalization problem. Turbulent flows naturally present a wide range of local orientations, so respecting the symmetries of the Navier-Stokes equations improves generalization to new flows. Importantly, turbulence's rotational structure can embed these symmetries into learned mappings -- an effect that strengthens with isotropy and dataset size. This is because a more isotropic dataset samples a wider range of orientations, more fully covering the rotational symmetries of the Navier-Stokes equations. The dependence on isotropy means equivariance error is also scale-dependent, consistent with Kolmogorov's hypothesis. Therefore, turbulence provides its own data augmentation (we term this implicit data augmentation). We expect this effect to apply broadly to learned mappings between tensorial flow quantities, making it relevant to most machine learning applications in turbulence.
DOI

Frequent coauthors

Regina Barzilay
174 shared
Wengong Jin
64 shared
David K. Gifford
Massachusetts Institute of Technology
52 shared
Amir Globerson
Tel Aviv University
34 shared
David Alvarez-Melis
30 shared
Klavs F. Jensen
Massachusetts Institute of Technology
25 shared
Karthik Narasimhan
24 shared
Connor W. Coley
Massachusetts Institute of Technology
24 shared

Labs

MIT EECS Artificial Intelligence + Decision-making LabPI

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Tommi Jaakkola

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you