Gemini Audio

Our most advanced audio models push new frontiers with intuitive inputs, intelligent understanding and natural expressiveness

Try Live dialogue

Try Speech generation

3.1 Flash Live

Best for low-latency, fluid and natural vocal rhythm. Solves complex tasks while recognizing nuances in voices like pitch and pace.

Build with Gemini Audio

3.5 Live Translate

Best for real-time speech-to-speech translation. Overcomes language barriers across 70+ languages while maintaining the speaker’s natural tone and rhythm.

Build with Gemini Audio

3.1 Flash TTS

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.

Build with Gemini Audio

Explore the latest

Natural and powerful audio models. Helping people communicate, developers build, and enterprises manage business.

Capabilities
Models
Hands-on
Safety
Try Gemini Audio

Capabilities

Talk in real-time. Control with precision. Understand every nuance.

Models

Our audio models generate natural vocals at speed and scale for different developer workflows.

3.1 Flash Live

Best for low-latency, fluid and natural vocal rhythm. Solves complex tasks while recognizing nuances in voices like pitch and pace.

Try it in

Google AI Studio

3.5 Live Translate

Best for real-time speech-to-speech translation, overcoming language barriers across 70+ languages.

Try it in

Google AI Studio

3.1 Flash TTS

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.

Try it in

Google AI Studio

Hands-on

Explore what you can do with Gemini Audio

Your browser does not support the video tag.

Showcasing Gemini Flash Live

Multi-step task management

Holds fluid and natural low-latency conversations while calling functions to manage multi-step and complex large-scale tasks.

Your browser does not support the video tag.

Showcasing Gemini Flash TTS

Expressive speech generation

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.

Showcasing 3.5 Live Translate

Real-time meeting translation

Translates multiple languages in a single session, while preserving each speaker’s original intonation, pacing and pitch.

Safety

Building with responsibility at the core

We’ve proactively assessed potential risks during every stage of the development process for these native audio features, using what we’ve learned to inform our mitigation strategies. We validate these measures through rigorous internal and external safety evaluations, including comprehensive red teaming for responsible deployment.

All audio outputs from our models are marked with SynthID, our advanced watermarking technology, allowing you to detect whether speech has been created or edited using Google AI.

Learn more

Try Gemini Audio

Google AI Studio

The fastest path from prompt to production

Try in Google AI Studio

Google Vids

AI-powered video creation for work

Try in Google Vids

Gemini API

Get started with cutting-edge AI models 

Learn more

Gemini Live API

Low-latency, real-time voice and video interactions with Gemini

Learn more

Gemini Enterprise Agent Platform

Build, scale, and govern agents

Learn more

Gemini Enterprise for Customer Experience

Deploy specialized agents for product discovery, shopping, and customer service

Learn more

Google Translate

Understand your world and communicate across languages

Learn more

Explore our next generation AI systems

Our latest AI breakthroughs and updates from the lab

Unlocking a new era of discovery with AI

Our mission is to build AI responsibly to benefit humanity

Gemini Audio

3.1 Flash Live

3.5 Live Translate

3.1 Flash TTS

Capabilities