Anthony Gitter

· Associate Professor; Investigator, Morgridge Institute for ResearchVerified

University of Wisconsin-Madison · Biostatistics and Medical Informatics

Active 1967–2026

h-index32

Citations7.2k

Papers19690 last 5y

Funding$3.0M

Faculty page Lab page

See your match with Anthony Gitter — sign in to PhdFit.Sign in

About

Anthony Gitter is an associate professor in the Department of Biostatistics and Medical Informatics and affiliate faculty in the Department of Computer Sciences at the University of Wisconsin-Madison. He also holds the Jeanne M. Rowe Chair at the Morgridge Institute for Research in the John W. and Jeanne M. Rowe Center for Research in Virology and Research Computing. Additionally, he is an affiliate of the Data Science Institute and the Center for Genomic Science Innovation, and a member of the UW Carbone Cancer Center Cancer Genetic and Epigenetic Mechanisms Scientific Program. His research group focuses on using network modeling to integrate genomic, transcriptomic, and proteomic data to provide a cohesive view of biological processes, with a special emphasis on virology and oncology. They also explore machine learning applications in biochemistry, including computationally-guided chemical screening and protein engineering. Anthony Gitter received his Ph.D. in Computer Science from Carnegie Mellon University and completed a joint postdoctoral fellowship at Microsoft Research New England and the Massachusetts Institute of Technology.

Research topics

Computer Science
Artificial Intelligence
Machine Learning
Data Mining
Genetics
Software engineering
Biology
Computational biology
Database
Statistics
Mathematics
Mathematics education
Psychology

Selected publications

A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data
ChemRxiv · 2026-01-06
articleOpen access
Computational blind challenges offer critical, unbiased assessment opportunities to assess and accelerate scientific progress, as demonstrated by a breadth of breakthroughs over the last decade. We report the outcomes and key insights from an open science community blind challenge focused on computational methods in drug discovery, using lead optimization data from the AI-driven Structure-enabled Antiviral Platform (ASAP) Discovery Consortium’s pan-coronavirus antiviral discovery program, in partnership with Polaris and the OpenADMET project. This collaborative initiative invited global participants from both academia and industry to develop and apply computational methods to predict the biochemical potency and crystallographic ligand poses of small molecules against key coronavirus targets, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (Mpro), as well as multiple ADMET assay endpoints, using previously undisclosed comprehensive experimental drug discovery datasets as benchmarks. By evaluating submissions across multiple tasks and compounds, we established performance leaderboards and conducted meta-analyses to assess methodological strengths, common pitfalls, and areas for improvement. This analysis provides a foundation for best practices in real-world machine learning evaluation, grounded in community-driven benchmarking. We also highlight how next-generation platforms, such as Polaris, enable rigorous challenge design, embedded evaluation frameworks, and broad community engagement. This paper reports the collective findings of the challenge, offering a high-level overview of the data, evaluation infrastructure, and top- performing strategies. We further provide context and support for the accompanying papers authored by the challenge participants in this special issue, which explore individual approaches in greater depth. Together, these contributions aim to advance reproducible, trustworthy, and high-impact computational methods in drug discovery, and to explore best practices and pitfalls in future blind challenge design and execution, including planned initiatives for the OpenADMET project.
Publisher DOI
HaihuaWang-hub/2020-workflows-paper: 2020 workflow paper
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-20
otherOpen access
2020 workflow paper
Publisher DOI
HaihuaWang-hub/2020-workflows-paper: 2020 workflow paper
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-20
otherOpen access
2020 workflow paper
Publisher DOI
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data.
Apollo (University of Cambridge) · 2026-03-23
articleOpen access
Computational blind challenges offer critical, unbiased opportunities to assess and accelerate scientific progress, as demonstrated by a breadth of breakthroughs over the past decade. We report the outcomes and key insights from an open science community blind challenge focused on computational methods in drug discovery, using lead optimization data from the AI-driven Structure-enabled Antiviral Platform Discovery Consortium's pan-coronavirus antiviral discovery program, in partnership with Polaris and the OpenADMET project. This collaborative initiative invited global participants from both academia and industry to develop and apply computational methods to predict the biochemical potency and crystallographic ligand poses of small molecules against key coronavirus targets, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (Mpro), as well as multiple ADMET assay end points, using previously undisclosed comprehensive experimental drug discovery data sets as benchmarks. By evaluating submissions across multiple tasks and compounds, we established performance leaderboards and conducted meta-analyses to assess methodological strengths, common pitfalls, and areas for improvement. This analysis provides a foundation for best practices in real-world machine learning evaluation, grounded in community-driven benchmarking. We also highlight how next-generation platforms, such as Polaris, enable rigorous challenge design, embedded evaluation frameworks, and broad community engagement. This paper reports the collective findings of the challenge, offering a high-level overview of the data, evaluation infrastructure, and top-performing strategies. We further provide context and support for the accompanying papers authored by the challenge participants in this special issue, which explore individual approaches in greater depth. Together, these contributions aim to advance reproducible, trustworthy, and high-impact computational methods in drug discovery, and to explore best practices and pitfalls in future blind challenge design and execution, including planned initiatives for the OpenADMET project.
Publisher DOI
slolab/aac-manuscript: First preprint version
Open MIND · 2026-02-16
other
The first version of the preprint for the AAC manuscript.
DOI
seandavi/awesome-single-cell: 2026-02-02
Zenodo (CERN European Organization for Nuclear Research) · 2026-02-02
otherOpen access
Monthly release for 2026-02-02. This release was automatically created by GitHub Actions.\nThis release triggers an update to the Zenodo record: https://zenodo.org/records/1169173
Publisher DOI
Chemical Language Model Linker: Blending Text and Molecules with Modular Adapters
Journal of Chemical Information and Modeling · 2025-08-21 · 6 citations
articleOpen accessSenior authorCorresponding
The development of large language models and multimodal models has enabled the appealing idea of generating novel molecules from text descriptions. Generative modeling would shift the paradigm from relying on large-scale chemical screening to find molecules with desired properties to directly generating those molecules. However, multimodal models combining text and molecules are often trained from scratch, without leveraging existing high-quality pretrained models. Training from scratch consumes more computational resources and prohibits model scaling. In contrast, we propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML). ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions while still operating in the specialized embedding spaces of the molecular domain. ChemLML can tailor diverse pretrained text models for molecule generation by training relatively few adapter parameters. We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance. SMILES is often preferable despite not guaranteeing valid molecules. We raise issues in using the entire PubChem data set of molecules and their associated descriptions for evaluating molecule generation and provide a filtered version of the data set as a generation test set. To demonstrate how ChemLML could be used in practice, we generate candidate protein inhibitors and use docking to assess their quality and also generate candidate membrane permeable molecules.
Publisher OA PDF DOI
Protein Set Transformer: a protein-based genome language model to power high-diversity viromics
Nature Communications · 2025-11-23 · 2 citations
articleOpen access
Exponential increases in microbial and viral genomic data demand transformational advances in scalable, generalizable frameworks for their interpretation. Standard homology-based functional analyses are hindered by the rapid divergence of microbial and especially viral genomes and proteins that significantly decreases the volume of usable data. Here, we present Protein Set Transformer (PST), a protein-based genome language model that models genomes as sets of proteins without considering sparsely available functional labels. Trained on >100k viruses, PST outperforms other homology- and language model-based approaches for relating viral genomes based on shared protein content. Further, PST demonstrates protein structural and functional awareness by clustering capsid-fold-containing proteins with known capsid proteins and uniquely clustering late gene proteins within related viruses. Our data establish PST as a valuable method for diverse viral genomics, ecology, and evolutionary applications. We posit that the PST framework can be a foundation model for microbial genomics when trained on suitable data.
Publisher DOI
Responsible Biodesign Workshop: AI, Protein Design, and the Biosecurity Landscape – Recommended Actions
2025-06-04
preprintOpen access
This report presents Recommended Actions from the January 2025 Responsible Biodesign Workshop, which convened leading experts across AI-enabled biomolecular design and biosecurity policy. Building on existing community commitments for the Responsible Development of AI for Protein Design, the Recommended Actions aim to guide scientists, policy practitioners, and funding bodies in ensuring safe and beneficial development of AI-enabled biomolecular design tools. The Recommended Actions focus on advancing AI-Resilient nucleic acid synthesis security screening, assessing the risk-benefit landscape of biomolecular design capabilities, and building fora for sustained engagement between scientists and policy practitioners.
Publisher OA PDF DOI
Biophysics-based protein language models for protein engineering
Nature Methods · 2025-09-01 · 28 citations
articleOpen access
Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose mutational effect transfer learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics. We fine-tune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Publisher OA PDF DOI

Recent grants

CAREER: Inference in temporal signaling and transcriptional data
NSF · $887k · 2016–2022
A Machine Learning Platform for Adaptive Chemical Screening
NIH · $2.1M · 2020–2025

Frequent coauthors

Casey S. Greene
152 shared
Halie M. Rando
Smith College
88 shared
Simina M. Boca
AstraZeneca (Brazil)
59 shared
Alexandra Lee
59 shared
Ronan Lordan
University of Pennsylvania
56 shared
Nils Wellhausen
University of Pennsylvania
56 shared
Shengchao Liu
47 shared
Moayad Alnammi
King Fahd University of Petroleum and Minerals
37 shared

Labs

Gitter LabPI
Computational biology research group at the University of Wisconsin-Madison and Morgridge Institute

Education

Ph.D., Biostatistics
University of Wisconsin–Madison
2007
M.S., Biostatistics
University of Wisconsin–Madison
2003
B.S., Mathematics
University of Wisconsin–Madison
2001

Resume-aware match score
Save to shortlist
AI-drafted outreach

See your match with Anthony Gitter

PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.

Join the waitlist How it works

Free to start
No credit card
30-second signup

Find professors who actually fit you