Gemini Pro

Our best model for general performance across a wide range of tasks

Natively multimodal, with an updated long context window of up to two million tokens — the longest of any large-scale foundation model.

Demo

Longer context

1.5 Pro introduces a breakthrough context window of up to two million tokens — the longest context window of any large scale foundation model yet. It achieves near-perfect recall on long-context retrieval tasks across modalities, unlocking the ability to accurately process large-scale documents, thousands of lines of code, hours of audio, video, and more.

Better understanding across modalities

Gemini 1.5 Pro can perform highly sophisticated reasoning tasks using text, images, audio, and video. Here, it uses a hand-drawn picture to pinpoint a scene in a Buster Keaton movie.

Reasoning about vast amounts of information

Watch Gemini 1.5 Pro analyze and summarize the 402-page transcript from Apollo 11’s mission to the moon.

Problem-solving with longer blocks of code

See Gemini 1.5 Pro reason across 100,000 lines of code and give helpful solutions, modifications, and explanations.

Pros on Pro

Developers have been putting 1.5 Pro to the test using Google AI Studio and Vertex AI.

Relentless innovation

Our research team is continually exploring new ideas at the frontier of AI, building innovative products that show consistent progress on a range of benchmarks.

Capability	Benchmark	Description	Gemini 1.5 Flash (May 2024)	Gemini 1.5 Flash (Sep 2024)	Gemini 1.5 Pro (May 2024)	Gemini 1.5 Pro (Sep 2024)
General MMLU-Pro Enhanced version of popular MMLU dataset with questions across multiple subjects with higher difficulty tasks
General	MMLU-Pro	Enhanced version of popular MMLU dataset with questions in 57 subjects (incl. STEM, humanities, and others) with higher difficulty tasks	Gemini 1.5 Flash (May 2024) 59.1%	Gemini 1.5 Flash (Sep 2024) 67.3%	Gemini 1.5 Pro (May 2024) 69.0%	Gemini 1.5 Pro (May 2024) 75.8%
Code Natural2Code Code generation across Python, Java, C++, JS, Go . Held out dataset HumanEval-like, not leaked on the web
Code	Natural2Code	Code generation across Python, Java, C++, JS, Go . Held out dataset HumanEval-like, not leaked on the web	Gemini 1.5 Flash (May 2004) 77.2%	Gemini 1.5 Flash (Sep 2024) 79.8%	Gemini 1.5 Pro (May 2024) 82.6%	Gemini 1.5 Pro (Sep 2024) 85.4%
Math MATH Challenging math problems (incl. algebra, geometry, pre-calculus, and others)
Math	MATH	Challenging math problems (incl. algebra, geometry, pre-calculus, and others)	Gemini 1.5 Flash (May 2004) 54.9%	Gemini 1.5 Flash (Sep 2024) 77.9%	Gemini 1.5 Pro (May 2024) 67.7%	Gemini 1.5 Pro 86.5%
HiddenMath Competition-level math problems, Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web
	HiddenMath	Competition-level math problems, Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web	Gemini 1.5 Flash (May 2004) 20.3%	Gemini 1.5 Flash (Sep 2024) 47.2%	Gemini 1.5 Pro (May 2024) 28.0%	Gemini 1.5 Pro 52.0%
Reasoning GPQA (diamond) Challenging dataset of questions written by domain experts in biology, physics, and chemistry
Reasoning	GPQA (diamond)	Challenging dataset of questions written by domain experts in biology, physics, and chemistry	Gemini 1.5 Flash (May 2024) 41.4%	Gemini 1.5 Flash (Sep 2024) 51.0%	Gemini 1.5 Pro (May 2024) 46.0%	Gemini 1.5 Pro (Sep 2024) 59.1%
Multilingual WMT23 Language translation
Multilingual	WMT23	Language translation	Gemini 1.5 Flash (May 2024) 74.1	Gemini 1.5 Flash (Sep 2024) 73.9	Gemini 1.5 Pro (May 2024) 75.3	Gemini 1.5 Pro (Sep 2024) 75.1
Long Context RULER (at 1M) Diagnostic suite checking long-context ability of the models over a range of tasks
Long Context	RULER (at 1M)	Diagnostic suite checking long-context ability of the models over a range of tasks	Gemini 1.5 Flash (May 2024) 69.6%	Gemini 1.5 Flash (Sep 2024) 82.3%	Gemini 1.5 Pro (May 2024) 40.1%	Gemini 1.5 Pro (Sep 2024) 86.4%
MRCR (1M) Diagnostic long-context understanding evaluation
	MRCR (1M)	Diagnostic long-context understanding evaluation	Gemini 1.5 Flash (May 2024) 70.1%	Gemini 1.5 Flash (Sep 2024) 71.9%	Gemini 1.5 Pro (May 2024) 70.5%	Gemini 1.5 Pro (Sep 2024) 82.6%
Image MMMU Multi-discipline college-level reasoning problems
Image	MMMU	Multi-discipline college-level reasoning problems	Gemini 1.5 Flash (May 2024) 56.1%	Gemini 1.5 Flash (Sep 2024) 62.3%	Gemini 1.5 Pro (May 2024) 62.2%	Gemini 1.5 Pro (Sep 2024) 65.9%
Vibe-Eval (Reka) Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater
	Vibe-Eval (Reka)	Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater	Gemini 1.5 Flash (May 2024) 44.8%	Gemini 1.5 Flash (Sep 2024) 48.9%	Gemini 1.5 Pro (May 2024) 48.9%	Gemini 1.5 Pro (Sep 2024) 53.9%
Image MathVista Mathematical reasoning in visual contexts
	MathVista	Mathematical reasoning in visual contexts	Gemini 1.5 Flash (May 2024) 58.4%	Gemini 1.5 Flash (Sep 2024) 65.8%	Gemini 1.5 Pro (May 2024) 63.9%	Gemini 1.5 Pro (Sep 2024) 68.1%
Audio FLEURS (55 languages) Automatic speech recognition (based on word error rate, lower is better)
Audio	FLEURS (55 languages)	Automatic speech recognition (based on word error rate, lower is better)	Gemini 1.5 Flash (May 2024) 9.8%	Gemini 1.5 Flash (Sep 2024) 9.6%	Gemini 1.5 Pro (May 2024) 6.5%	Gemini 1.5 Pro (May 2024) 6.7%
Video Video-MME Video analysis across multiple domains
Video	Video-MME	Video analysis across multiple domains	Gemini 1.5 Flash (May 2024) 74.7%	Gemini 1.5 Flash (Sep 2024) 76.1%	Gemini 1.5 Pro (May 2024) 77.9%	Gemini 1.5 Pro (May 2024) 78.6%
Safety XSTest Measures how often models refuse to respond to safe/benign prompts. The score represents how frequently models correctly fulfill requests
Safety	XSTest	Measures how often models refuse to respond to safe/benign prompts. The score represents how frequently models correctly fulfill requests	Gemini 1.5 Flash (May 2024) 86.9%	Gemini 1.5 Flash (Sep 2024) 97.0%	Gemini 1.5 Pro (May 2024) 88.4%	Gemini 1.5 Pro (May 2024) 98.8%