Gemini Audio

Speech generation

Craft anything from short snippets to long-form speeches, with granular control over style, pace, delivery and performance

Models

Build agents capable of realistic speech with professional-grade audio and low latency – ready for deployment at any scale.




Performance

Our speech generation models deliver impressively fast speech generation without compromising on vocal stability or expressive quality.

Source: Artificial Analysis Text to Speech (TTS), data as of April 15, 2026.

Artificial Analysis Text to Speech (TTS) Arena Quality Elo

Models are ranked using an Elo rating system derived from user votes in blind comparison in the Speech Arena. Users listen to pairs of speech samples generated from the same text and choose which sounds more natural. Higher Elo scores indicate a model produces speech preferred more often by listeners.

Model information

Name
3.1 Flash TTS
Status
Preview
Input
  • Text
Output
  • Audio
Input tokens
16k
Output tokens
32k
Knowledge cutoff
January 2025
Availability
  • Google AI Studio
  • Gemini API
  • Gemini Enterprise Agent Platform
  • Google Vids
Documentation
View developer docs
Model card
View model card

Try Speech generation