
Anthony Gitter
· Associate Professor; Investigator, Morgridge Institute for ResearchVerifiedUniversity of Wisconsin-Madison · Biostatistics and Medical Informatics
Active 1967–2026
About
Anthony Gitter is an associate professor in the Department of Biostatistics and Medical Informatics and affiliate faculty in the Department of Computer Sciences at the University of Wisconsin-Madison. He also holds the Jeanne M. Rowe Chair at the Morgridge Institute for Research in the John W. and Jeanne M. Rowe Center for Research in Virology and Research Computing. Additionally, he is an affiliate of the Data Science Institute and the Center for Genomic Science Innovation, and a member of the UW Carbone Cancer Center Cancer Genetic and Epigenetic Mechanisms Scientific Program. His research group focuses on using network modeling to integrate genomic, transcriptomic, and proteomic data to provide a cohesive view of biological processes, with a special emphasis on virology and oncology. They also explore machine learning applications in biochemistry, including computationally-guided chemical screening and protein engineering. Anthony Gitter received his Ph.D. in Computer Science from Carnegie Mellon University and completed a joint postdoctoral fellowship at Microsoft Research New England and the Massachusetts Institute of Technology.
Research topics
- Computer Science
- Artificial Intelligence
- Machine Learning
- Data Mining
- Genetics
- Software engineering
- Biology
- Computational biology
- Database
- Statistics
- Mathematics
- Mathematics education
- Psychology
Selected publications
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data
ChemRxiv · 2026-01-06
articleOpen accessComputational blind challenges offer critical, unbiased assessment opportunities to assess and accelerate scientific progress, as demonstrated by a breadth of breakthroughs over the last decade. We report the outcomes and key insights from an open science community blind challenge focused on computational methods in drug discovery, using lead optimization data from the AI-driven Structure-enabled Antiviral Platform (ASAP) Discovery Consortium’s pan-coronavirus antiviral discovery program, in partnership with Polaris and the OpenADMET project. This collaborative initiative invited global participants from both academia and industry to develop and apply computational methods to predict the biochemical potency and crystallographic ligand poses of small molecules against key coronavirus targets, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (Mpro), as well as multiple ADMET assay endpoints, using previously undisclosed comprehensive experimental drug discovery datasets as benchmarks. By evaluating submissions across multiple tasks and compounds, we established performance leaderboards and conducted meta-analyses to assess methodological strengths, common pitfalls, and areas for improvement. This analysis provides a foundation for best practices in real-world machine learning evaluation, grounded in community-driven benchmarking. We also highlight how next-generation platforms, such as Polaris, enable rigorous challenge design, embedded evaluation frameworks, and broad community engagement. This paper reports the collective findings of the challenge, offering a high-level overview of the data, evaluation infrastructure, and top- performing strategies. We further provide context and support for the accompanying papers authored by the challenge participants in this special issue, which explore individual approaches in greater depth. Together, these contributions aim to advance reproducible, trustworthy, and high-impact computational methods in drug discovery, and to explore best practices and pitfalls in future blind challenge design and execution, including planned initiatives for the OpenADMET project.
HaihuaWang-hub/2020-workflows-paper: 2020 workflow paper
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-20
otherOpen access2020 workflow paper
HaihuaWang-hub/2020-workflows-paper: 2020 workflow paper
Zenodo (CERN European Organization for Nuclear Research) · 2026-01-20
otherOpen access2020 workflow paper
A Computational Community Blind Challenge on Pan-Coronavirus Drug Discovery Data.
Apollo (University of Cambridge) · 2026-03-23
articleOpen accessComputational blind challenges offer critical, unbiased opportunities to assess and accelerate scientific progress, as demonstrated by a breadth of breakthroughs over the past decade. We report the outcomes and key insights from an open science community blind challenge focused on computational methods in drug discovery, using lead optimization data from the AI-driven Structure-enabled Antiviral Platform Discovery Consortium's pan-coronavirus antiviral discovery program, in partnership with Polaris and the OpenADMET project. This collaborative initiative invited global participants from both academia and industry to develop and apply computational methods to predict the biochemical potency and crystallographic ligand poses of small molecules against key coronavirus targets, Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (Mpro), as well as multiple ADMET assay end points, using previously undisclosed comprehensive experimental drug discovery data sets as benchmarks. By evaluating submissions across multiple tasks and compounds, we established performance leaderboards and conducted meta-analyses to assess methodological strengths, common pitfalls, and areas for improvement. This analysis provides a foundation for best practices in real-world machine learning evaluation, grounded in community-driven benchmarking. We also highlight how next-generation platforms, such as Polaris, enable rigorous challenge design, embedded evaluation frameworks, and broad community engagement. This paper reports the collective findings of the challenge, offering a high-level overview of the data, evaluation infrastructure, and top-performing strategies. We further provide context and support for the accompanying papers authored by the challenge participants in this special issue, which explore individual approaches in greater depth. Together, these contributions aim to advance reproducible, trustworthy, and high-impact computational methods in drug discovery, and to explore best practices and pitfalls in future blind challenge design and execution, including planned initiatives for the OpenADMET project.
slolab/aac-manuscript: First preprint version
Open MIND · 2026-02-16
otherThe first version of the preprint for the AAC manuscript.
seandavi/awesome-single-cell: 2026-02-02
Zenodo (CERN European Organization for Nuclear Research) · 2026-02-02
otherOpen accessMonthly release for 2026-02-02. This release was automatically created by GitHub Actions.\nThis release triggers an update to the Zenodo record: https://zenodo.org/records/1169173
Chemical Language Model Linker: Blending Text and Molecules with Modular Adapters
Journal of Chemical Information and Modeling · 2025-08-21 · 6 citations
articleOpen accessSenior authorCorrespondingThe development of large language models and multimodal models has enabled the appealing idea of generating novel molecules from text descriptions. Generative modeling would shift the paradigm from relying on large-scale chemical screening to find molecules with desired properties to directly generating those molecules. However, multimodal models combining text and molecules are often trained from scratch, without leveraging existing high-quality pretrained models. Training from scratch consumes more computational resources and prohibits model scaling. In contrast, we propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML). ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions while still operating in the specialized embedding spaces of the molecular domain. ChemLML can tailor diverse pretrained text models for molecule generation by training relatively few adapter parameters. We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance. SMILES is often preferable despite not guaranteeing valid molecules. We raise issues in using the entire PubChem data set of molecules and their associated descriptions for evaluating molecule generation and provide a filtered version of the data set as a generation test set. To demonstrate how ChemLML could be used in practice, we generate candidate protein inhibitors and use docking to assess their quality and also generate candidate membrane permeable molecules.
Protein Set Transformer: a protein-based genome language model to power high-diversity viromics
Nature Communications · 2025-11-23 · 2 citations
articleOpen accessExponential increases in microbial and viral genomic data demand transformational advances in scalable, generalizable frameworks for their interpretation. Standard homology-based functional analyses are hindered by the rapid divergence of microbial and especially viral genomes and proteins that significantly decreases the volume of usable data. Here, we present Protein Set Transformer (PST), a protein-based genome language model that models genomes as sets of proteins without considering sparsely available functional labels. Trained on >100k viruses, PST outperforms other homology- and language model-based approaches for relating viral genomes based on shared protein content. Further, PST demonstrates protein structural and functional awareness by clustering capsid-fold-containing proteins with known capsid proteins and uniquely clustering late gene proteins within related viruses. Our data establish PST as a valuable method for diverse viral genomics, ecology, and evolutionary applications. We posit that the PST framework can be a foundation model for microbial genomics when trained on suitable data.
2025-06-04
preprintOpen accessThis report presents Recommended Actions from the January 2025 Responsible Biodesign Workshop, which convened leading experts across AI-enabled biomolecular design and biosecurity policy. Building on existing community commitments for the Responsible Development of AI for Protein Design, the Recommended Actions aim to guide scientists, policy practitioners, and funding bodies in ensuring safe and beneficial development of AI-enabled biomolecular design tools. The Recommended Actions focus on advancing AI-Resilient nucleic acid synthesis security screening, assessing the risk-benefit landscape of biomolecular design capabilities, and building fora for sustained engagement between scientists and policy practitioners.
Biophysics-based protein language models for protein engineering
Nature Methods · 2025-09-01 · 28 citations
articleOpen accessProtein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose mutational effect transfer learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics. We fine-tune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.
Recent grants
CAREER: Inference in temporal signaling and transcriptional data
NSF · $887k · 2016–2022
A Machine Learning Platform for Adaptive Chemical Screening
NIH · $2.1M · 2020–2025
Frequent coauthors
- 152 shared
Casey S. Greene
- 88 shared
Halie M. Rando
Smith College
- 59 shared
Simina M. Boca
AstraZeneca (Brazil)
- 59 shared
Alexandra Lee
- 56 shared
Ronan Lordan
University of Pennsylvania
- 56 shared
Nils Wellhausen
University of Pennsylvania
- 47 shared
Shengchao Liu
- 37 shared
Moayad Alnammi
King Fahd University of Petroleum and Minerals
Labs
Computational biology research group at the University of Wisconsin-Madison and Morgridge Institute
Education
- 2007
Ph.D., Biostatistics
University of Wisconsin–Madison
- 2003
M.S., Biostatistics
University of Wisconsin–Madison
- 2001
B.S., Mathematics
University of Wisconsin–Madison
- Resume-aware match score
- Save to shortlist
- AI-drafted outreach
See your match with Anthony Gitter
PhdFit ranks faculty by your research interests, methods, and publications — grounded in their actual work, not templates.
- Free to start
- No credit card
- 30-second signup