Jump to Content

Lightweight models, two variants, both optimized for speed and efficiency

Lightweight, fast and cost-efficient models featuring multimodal reasoning and a breakthrough long context window of up to one million tokens.

Small, smaller

Flash now comes in two compact variants, giving you the flexibility for whatever you choose to build.

Performance in a flash

Designed to be fast and efficient to serve at scale.

  • Built for speed

    Sub-second average first-token latency for the vast majority of developer and enterprise use cases.

  • Quality at lower cost

    On most common tasks, 1.5 Flash models achieve comparable quality to larger models, at a fraction of the cost.

  • Long-context understanding

    Process hours of video and audio, and hundreds of thousands of words or lines of code.

Longer context

Flash models have a one-million-token context window by default, which means you can process one hour of video, 11 hours of audio, codebases with more than 30,000 lines of code, or over 700,000 words.

Relentless innovation

Our research team is continually exploring new ideas at the frontier of AI, building innovative products that show consistent progress on a range of benchmarks.

Capability

Benchmark

Description

Gemini 1.5 Flash-8B

(Oct 2024)

Gemini 1.5 Flash

(May 2024)

Gemini 1.5 Flash

(Sep 2024)

Gemini 1.5 Pro

(May 2024)

Gemini 1.5 Pro

(Sep 2024)

General

MMLU-Pro

Enhanced version of popular MMLU dataset with questions across multiple subjects with higher difficulty tasks

General

MMLU-Pro

Enhanced version of popular MMLU dataset with questions across multiple subjects with higher difficulty tasks

Gemini 1.5 Flash-8B

(Oct 2024)

58.7%

Gemini 1.5 Flash

(May 2024)

59.1%

Gemini 1.5 Flash

(Sep 2024)

67.3%

Gemini 1.5 Pro

(May 2024)

69.0%

Gemini 1.5 Pro

(Sept 2024)

75.8%

Code

Natural2Code

Code generation across Python, Java, C++, JS, Go . Held out dataset HumanEval-like, not leaked on the web

Code

Natural2Code

Code generation across Python, Java, C++, JS, Go . Held out dataset HumanEval-like, not leaked on the web

Gemini 1.5 Flash-8B

(Oct 2024)

75.5%

Gemini 1.5 Flash

(May 2024)

77.2%

Gemini 1.5 Flash

(Sep 2024)

79.8%

Gemini 1.5 Pro

(May 2024)

82.6%

Gemini 1.5 Pro

(Sep 2024)

85.4%

Math

MATH

Challenging math problems (incl. algebra, geometry, pre-calculus, and others)

Math

MATH

Challenging math problems (incl. algebra, geometry, pre-calculus, and others)

Gemini 1.5 Flash-8B

(Oct 2024)

58.7%

Gemini 1.5 Flash

(May 2024)

54.9%

Gemini 1.5 Flash

(Sep 2024)

77.9%

Gemini 1.5 Pro

(May 2024)

67.7%

Gemini 1.5 Pro

(Sep 2024)

86.5%

HiddenMath

Competition-level math problems, Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web

HiddenMath

Competition-level math problems, Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web

Gemini 1.5 Flash-8B

(Oct 2024)

32.8%

Gemini 1.5 Flash

(May 2024)

20.3%

Gemini 1.5 Flash

(Sep 2024)

47.2%

Gemini 1.5 Pro

(May 2024)

28.0%

Gemini 1.5 Pro

(Sep 2024)

52.0%

Reasoning

GPQA (diamond)

Challenging dataset of questions written by domain experts in biology, physics, and chemistry

Reasoning

GPQA (diamond)

Challenging dataset of questions written by domain experts in biology, physics, and chemistry

Gemini 1.5 Flash-8B

(Oct 2024)

38.4%

Gemini 1.5 Flash

(May 2024)

41.4%

Gemini 1.5 Flash

(Sep 2024)

51.0%

Gemini 1.5 Pro

(May 2024)

46.0%

Gemini 1.5 Pro

(Sep 2024)

59.1%

Multilingual

WMT23

Language translation

Multilingual

WMT23

Language translation

Gemini 1.5 Flash-8B

(Oct 2024)

72.6

Gemini 1.5 Flash

(May 2024)

74.1

Gemini 1.5 Flash

(Sep 2024)

73.9

Gemini 1.5 Pro

(May 2024)

75.3

Gemini 1.5 Pro

(Sep 2024)

75.1

Long Context

MRCR (1M)

Diagnostic long-context understanding evaluation

Long Context

MRCR (1M)

Diagnostic long-context understanding evaluation

Gemini 1.5 Flash-8B

(Oct 2024)

54.7%

Gemini 1.5 Flash

(May 2024)

70.1%

Gemini 1.5 Flash

(Sep 2024)

71.9%

Gemini 1.5 Pro

(May 2024)

70.5%

Gemini 1.5 Pro

(Sep 2024)

82.6%

Image

MMMU

Multi-discipline college-level multimodal reasoning problems

Image

MMMU

Multi-discipline college-level multimodal understanding and reasoning problems

Gemini 1.5 Flash-8B

(Oct 2024)

53.7%

Gemini 1.5 Flash

(May 2024)

56.1%

Gemini 1.5 Flash

(Sep 2024)

62.3%

Gemini 1.5 Pro

(May 2024)

62.2%

Gemini 1.5 Pro

(Sep 2024)

65.9%

Vibe-Eval (Reka)

Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater

Vibe-Eval (Reka)

Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater

Gemini 1.5 Flash-8B

(Oct 2024)

40.9%

Gemini 1.5 Flash

(May 2024)

44.8%

Gemini 1.5 Flash

(Sep 2024)

48.9%

Gemini 1.5 Pro

(May 2024)

48.9%

Gemini 1.5 Pro

(Sep 2024)

53.9%

MathVista

Mathematical reasoning in visual contexts

MathVista

Mathematical reasoning in visual contexts

Gemini 1.5 Flash-8B

(Oct 2024)

54.7%

Gemini 1.5 Flash

(May 2024)

58.4%

Gemini 1.5 Flash

(Sep 2024)

65.8%

Gemini 1.5 Pro

(May 2024)

63.9%

Gemini 1.5 Pro

(Sep 2024)

68.1%

Audio

FLEURS (55 lang)

Automatic speech recognition (based on word error rate, lower is better)

Audio

FLEURS (55 lang)

Automatic speech recognition (based on word error rate, lower is better)

Gemini 1.5 Flash-8B

(Oct 2024)

13.6%

Gemini 1.5 Flash

(May 2024)

9.8%

Gemini 1.5 Flash

(Sep 2024)

9.6%

Gemini 1.5 Pro

(May 2024)

6.5%

Gemini 1.5 Pro

(May 2024)

6.7%

Video

Video-MME

Video analysis across multiple domains

Video

Video-MME

Video analysis across multiple domains

Gemini 1.5 Flash-8B

(Oct 2024)

66.2%

Gemini 1.5 Flash

(May 2024)

74.7%

Gemini 1.5 Flash

(Sep 2024)

76.1%

Gemini 1.5 Pro

(May 2024)

77.9%

Gemini 1.5 Pro

(May 2024)

78.6%

Safety

XSTest

Measures how often models refuse to respond to safe/benign prompts. The score represents how frequently models correctly fulfill requests

Safety

XSTest

Measures how often models refuse to respond to safe/benign prompts. The score represents how frequently models correctly fulfill requests

Gemini 1.5 Flash-8B

(Oct 2024)

92.6%

Gemini 1.5 Flash

(May 2024)

86.9%

Gemini 1.5 Flash

(Sep 2024)

97.0%

Gemini 1.5 Pro

(May 2024)

88.4%

Gemini 1.5 Pro

(May 2024)

98.8%

Research

Technical reports

For developers

Build with Gemini

Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI.

Get the latest updates

Sign up for news on the latest innovations from Google DeepMind.