Introducing 2.5 Flash-Lite, a thinking model for those looking for low cost and latency.
Upgrade to Gemini 2.5
2.5 Flash-Lite excels at high-volume, latency-sensitive tasks like translation and classification.
-
Thinking, enabled
Experience improved reasoning and output quality with thinking mode and thinking budgets.
-
Superior latency
Benefit from faster response times.
-
Tool use
Utilize key Gemini 2.5 features including tool uses like Search and code execution.
-
Cost-efficient
2.5 Flash-Lite is our most cost-efficient 2.5 model yet.
Hands-on with 2.5 Flash-Lite
Benchmarks
2.5 Flash-Lite has all-round, significantly higher performance than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks.
Benchmark |
Gemini 2.0
Flash |
Gemini 2.5
Flash-Lite Preview 06-17 Non-thinking |
Gemini 2.5
Flash-Lite Preview 06-17 Thinking |
|
---|---|---|---|---|
Reasoning & knowledge
Humanity's Last Exam (no tools)
|
5.1%* | 5.1% | 6.9% | |
Science
GPQA diamond
|
65.2% | 64.6% | 66.7% | |
Mathematics
AIME 2025
|
29.7% | 49.8% | 63.1% | |
Code generation
LiveCodeBench
(UI: 1/1/2025-5/1/2025)
|
29.1% | 33.7% | 34.3% | |
Code editing
Aider Polyglot
|
21.3% | 26.7% | 27.1% | |
Agentic coding
SWE-bench Verified
|
single attempt | 21.4% | 31.6% | 27.6% |
|
multiple attempts | 34.2% | 42.6% | 44.9% |
Factuality
SimpleQA
|
29.9% | 10.7% | 13.0% | |
Factuality
FACTS grounding
|
84.6% | 84.1% | 86.8% | |
Visual reasoning
MMMU
|
69.3% | 72.9% | 72.9% | |
Image understanding
Vibe-Eval (Reka)
|
55.4% | 51.3% | 57.5% | |
Long context
MRCR v2 (8-needle)
|
128k (average) | 19.0% | 16.6% | 30.6% |
|
1M (pointwise) | 5.3% | 4.1% | 5.4% |
Multilingual performance
Global MMLU (Lite)
|
83.4% | 81.1% | 84.5% |
Model information
2.0 Flash-Lite | 2.5 Flash-Lite | |
Model deployment status | General availability | Preview |
Supported data types for input | Text, Image, Video, Audio | Text, Image, Video, Audio, PDF |
Supported data types for output | Text | Text |
Supported # tokens for input | 1M | 1M |
Supported # tokens for output | 8k | 64k |
Knowledge cutoff | June 2024 | January 2025 |
Tool use | — |
Search as a tool Code execution |
Best for | Low-cost workflows | High volume, low-cost and low latency tasks |
Availability |
Google AI Studio Gemini API Vertex AI |
Google AI Studio Gemini API Vertex AI |