RecurrentGemma

A family of open models using a novel recurrent architecture for faster processing of long sequences.

RecurrentGemma is based on Griffin, a hybrid model architecture that mixes gated linear recurrences with local sliding window attention.

Capabilities

memory

Reduced memory usage

Lower memory requirements allow for the generation of longer samples on devices with limited memory, like single GPUs or CPUs.

apps

Higher throughput

Performs inference at significantly higher batch sizes. Capable of generating substantially more tokens per second — especially for long sequences.

speed

High performance

RecurrentGemma matches Gemma's performance while requiring less memory and achieving faster inference.


Download RecurrentGemma