Gemini Audio

Audio understanding

Go beyond simple transcription, identify who’s talking, and understand the intent behind the words

Get clear and helpful insights directly from audio files. Identify speakers, understand the key points they’ve made – and grasp the sentiment behind those points.

Precise audio analysis

Unlock insights directly from audio files with Gemini’s audio capabilities.

Clear and actionable data

Transform unstructured audio – like voice notes, support calls, or lectures – into clean and actionable notes. Export as JSON format, in a summary, or as bullet points.

Precise speaker identification

Accurately distinguish and label multiple speakers within a single transcript. For clear and correct attribution in interviews, panels, and meetings.

Accurate speech sentiment analysis

Capture more than simple words. Record the sentiment and style of each person’s speech – all the bits that make speaking human.


Advanced audio understanding capabilities

A unified voice experience that cleans up speech, understands intent, and executes tasks.

Disfluency clean-up

Filters awkward pauses, “ums” and “ahs”, and other filler words, to produce polished text with accurate punctuation and useful formatting – at the speed of speech.

Gemini Intelligence

Gemini understands the desired outcome behind your words, allowing you to execute tasks using only your voice.

Adaptable voice editing

Refine your thoughts in the moment—correcting details, clarifying spellings, or shifting your tone without missing a beat.

Context and biasing

Interprets shared images, tables, and code as context, while mastering your nomenclature to ensure every output is framed the way you need.


Try Audio understanding