RecurrentGemma

Explore RecurrentGemma

RecurrentGemma is based on Griffin, a hybrid model architecture that mixes gated linear recurrences with local sliding window attention.

Lower memory requirements allow for the generation of longer samples on devices with limited memory, like single GPUs or CPUs.

Performs inference at significantly higher batch sizes. Capable of generating substantially more tokens per second — especially for long sequences.

RecurrentGemma matches Gemma's performance while requiring less memory and achieving faster inference.

Explore our next generation AI systems