Live dialogue
Fluid and natural live dialogue and translation capabilities, for powerful voice-first applications.
Our most advanced audio models push new frontiers with intuitive inputs, intelligent understanding and natural expressiveness
Best for low-latency, fluid and natural vocal rhythm. Solves complex tasks while recognizing nuances in voices like pitch and pace.
Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.
Natural and powerful audio models. Helping people communicate, developers build, and enterprises manage business.
Fluid and natural live dialogue and translation capabilities, for powerful voice-first applications.
Craft anything from short snippets to long-form narratives, with granular control over style, pace, delivery and performance.
Go beyond simple transcription, with models that identify who’s talking and understand the intent behind the words.
Best for low-latency, fluid and natural vocal rhythm. Solves complex tasks while recognizing nuances in voices like pitch and pace.
Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.
Holds fluid and natural low-latency conversations while calling functions to manage multi-step and complex large-scale tasks.
Best for directing intonation and inflection. Intuitive audio tags give you granular command over style, pace, and tone with unprecedented precision.
We’ve proactively assessed potential risks during every stage of the development process for these native audio features, using what we’ve learned to inform our mitigation strategies. We validate these measures through rigorous internal and external safety evaluations, including comprehensive red teaming for responsible deployment.
All audio outputs from our models are marked with SynthID, our advanced watermarking technology, allowing you to detect whether speech has been created or edited using Google AI.
The fastest path from prompt to production
AI-powered video creation for work
Get started with cutting-edge AI models
Low-latency, real-time voice and video interactions with Gemini
Build, scale, and govern agents
Deploy specialized agents for product discovery, shopping, and customer service