Natural and powerful audio models. Helping people communicate, developers build, and enterprises manage business.

Capabilities

Talk in real-time. Control with precision. Understand every nuance.


Models

Our audio models generate natural vocals at speed and scale for different developer workflows.


Hands-on

Explore what you can do with Gemini Audio

Showcasing Gemini Flash Live

Multi-step task management

Holds fluid and natural low-latency conversations while calling functions to manage multi-step and complex large-scale tasks.

Showcasing Gemini Flash TTS

Expressive speech generation

Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.


Safety

Building with responsibility at the core

We’ve proactively assessed potential risks during every stage of the development process for these native audio features, using what we’ve learned to inform our mitigation strategies. We validate these measures through rigorous internal and external safety evaluations, including comprehensive red teaming for responsible deployment.

All audio outputs from our models are marked with SynthID, our advanced watermarking technology, allowing you to detect whether speech has been created or edited using Google AI.


Try Gemini Audio