Gemini Audio

Audio understanding

Go beyond simple transcription, identify who’s talking, and understand the intent behind the words

Get clear and helpful insights directly from audio files. Identify speakers, understand the key points they’ve made – and grasp the sentiment behind those points.

Precise audio analysis

Unlock insights directly from audio files with Gemini’s audio capabilities.

Diagram showing an audio waveform icon transforming into a text icon, representing the conversion of audio into structured notes.

Clear and actionable data

Transform unstructured audio – like voice notes, support calls, or lectures – into clean and actionable notes. Export as JSON format, in a summary, or as bullet points.

A diagram illustrating speaker identification, with an audio waveform icon branching into two separate person icons labeled "Speaker 1" and "Speaker 2."

Precise speaker identification

Accurately distinguish and label multiple speakers within a single transcript. For clear and correct attribution in interviews, panels, and meetings.

A diagram illustrating speech sentiment analysis, with an audio waveform icon branching into three separate icons labeled "Laughter," "Sighs," and "Whisper."

Accurate speech sentiment analysis

Capture more than simple words. Record the sentiment and style of each person’s speech – all the bits that make speaking human.

Advanced audio understanding capabilities

A unified voice experience that cleans up speech, understands intent, and executes tasks.

Your browser does not support the video tag.

Disfluency clean-up

Filters awkward pauses, “ums” and “ahs”, and other filler words, to produce polished text with accurate punctuation and useful formatting – at the speed of speech.

Gemini Intelligence

Gemini understands the desired outcome behind your words, allowing you to execute tasks using only your voice.

Adaptable voice editing

Refine your thoughts in the moment—correcting details, clarifying spellings, or shifting your tone without missing a beat.

Context and biasing

Interprets shared images, tables, and code as context, while mastering your nomenclature to ensure every output is framed the way you need.

Try Audio understanding

Google Antigravity

Our AI-first development platform that allows anyone to be a builder

Download Google Antigravity

Gemini for macOS

Supercharge your creativity and productivity

Try soon in Gemini for macOS

Explore our next generation AI systems

Our latest AI breakthroughs and updates from the lab

Unlocking a new era of discovery with AI

Our mission is to build AI responsibly to benefit humanity