T5Gemma
A collection of encoder-decoder models that provide a strong quality-inference efficiency tradeoff
Download T5Gemma
T5Gemma adapts pretrained decoder-only Gemma 2 models into an encoder-decoder architecture. These models are trained with either PrefixLM for strong generative performance or UL2 for high-quality contextual representations.
Capabilities
-
Enhanced reasoning
Dedicated encoder significantly boosts performance on tasks requiring deep context comprehension, such as math reasoning (GSM8K).
-
Flexible architecture
Model adaptation techniques allows for flexible configurations, including "unbalanced" models where the encoder and decoder have different sizes.
-
High efficiency
Superior quality-to-efficiency ratio without extensive compute requirements.
Model variants
-
Gemma 2 sizes
Checkpoints based on the official Gemma 2 2B and 9B models, as well as the “unbalanced” 9B-2B checkpoint.
-
T5 sizes
Small, Base, Large, and XL sizes following the T5 configuration, plus an additional model sized between T5 Large and T5 X.