Abstract
Therapeutic development is a costly and high-risk endeavor that is often plagued by high failure rates. To address this, we introduce TxGemma, a suite of efficient, generalist large language models (LLMs) capable of therapeutic property prediction as well as interactive reasoning and explainability. Unlike task-specific models, TxGemma synthesizes information from diverse sources, enabling broad application across the therapeutic development pipeline. The suite includes 2B, 9B, and 27B parameter models, fine-tuned from Gemma-2 on a comprehensive dataset of small molecules, proteins, nucleic acids, diseases, and cell lines.Out of 66 therapeutic development tasks, TxGemma outperforms or nearly matches the state-of-the-art generalist model on 64 (outperforms on 45), and state-of-the-art specialist models on 50 (outperforms on 26). Beyond these predictive capabilities, TxGemma also features conversational models that bridge the gap between general LLMs and specialized property predictors. These allow scientists to interact with them in natural language, provide mechanistic reasoning for their predictions based on molecular structure,and engage in scientific discussions. Building on this, we also introduce TxAgent, a generalist therapeutic agent powered by Gemini 2.0 that reasons, acts, manages diverse workflows, and acquires external domain knowledge. TxAgent surpasses prior state-of-the-art models on the Humanity’s Last Exam benchmark (Chemistry & Biology tasks) with 9.8% relative improvement over o3-mini, an advanced reasoning model, and 17.9% over o1. On ChemBench, TxGemma excels with improvements of 5.6% (ChemBench-Preference) and 1.1% (ChemBench-Mini) over o3-mini, as well as 17.0% and 4.3% over o1, respectively. Wet-lab validation of TxGemma predictions remains an important future step. TxGemma will be released as an open model, which enables researchers to adapt and validate it on their own diverse datasets, and might facilitate more challenging real-world therapeutic applications.
Authors
Eric Wang, Samuel Schmidgall, Paul F. Jaeger, Fan Zhang, Rory Pilgrim, Yossi Matias, Joelle Barral, David Fleet, Shekoofeh Azizi
Venue
arXiv