Gemini 2.5 Flash-Lite

Best for high volume, cost efficient tasks

Introducing 2.5 Flash-Lite, a thinking model for those looking for low cost and latency.

Build with Gemini

Capabilities
Hands-on
Performance
Model information

Upgrade to Gemini 2.5

2.5 Flash-Lite excels at high-volume, latency-sensitive tasks like translation and classification.

Thinking, enabled

Experience improved reasoning and output quality with thinking mode and thinking budgets.

Superior latency

Benefit from faster response times.

Tool use

Utilize key Gemini 2.5 features including tool uses like Search and code execution.

Cost-efficient

2.5 Flash-Lite is our most cost-efficient 2.5 model yet.

Hands-on with 2.5 Flash-Lite

Build a dynamic UI

See how Gemini 2.5 Flash-Lite writes the code for a UI and its contents based solely on the context of what appears in the previous screen - all in the time it takes to click a button.

Turn PDFs into interactive web apps

See how Gemini 2.5 Flash-Lite built a research prototype that can instantly transform large PDF files into interactive web apps - making it easier to summarize and understand dense information.

Performance

2.5 Flash-Lite has all-round, significantly higher performance than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks.

Benchmark	Notes	Gemini 2.5 Flash-Lite Thinking	Gemini 2.5 Flash-Lite Non-thinking	Gemini 2.0 Flash
Reasoning & knowledge Humanity's Last Exam (no tools)		6.9%	5.1%	5.1%*
Mathematics AIME 2025		63.1%	49.8%	29.7%
Code generation LiveCodeBench (UI: 1/1/2025-5/1/2025)		34.3%	33.7%	29.1%
Code editing Aider Polyglot		27.1%	26.7%	21.3%
Agentic coding SWE-bench Verified	single attempt	27.6%	31.6%	21.4%
Agentic coding SWE-bench Verified	multiple attempts	44.9%	42.6%	34.2%
Factuality SimpleQA		13.0%	10.7%	29.9%
Factuality FACTS grounding		86.8%	84.1%	84.6%
Visual reasoning MMMU		72.9%	72.9%	69.3%
Image understanding Vibe-Eval (Reka)		57.5%	51.3%	55.4%
Long context MRCR v2	128k (average)	30.6%	16.6%	19.0%
Long context MRCR v2	1M (pointwise)	5.4%	4.1%	5.3%
Multilingual performance Global MMLU (Lite)		84.5%	81.1%	83.4%

Methodology

Gemini results: All Gemini scores are pass @1."Single attempt" settings allow no majority voting or parallel test-time compute; "multiple attempts" settings allow test-time selection of the candidate answer. They are all run with the AI Studio API with default sampling settings. To reduce variance, we average over multiple trials for smaller benchmarks. Aider Polyglot score is the pass rate average of 3 trials. Vibe-Eval results are reported using Gemini as a judge. Google's scaffolding for "multiple attempts" for SWE-Bench includes drawing multiple trajectories and re-scoring them using model's own judgement. For Aider results differ from the official leaderboard due to a difference in the settings used for evaluation (non-default).

Result sources: Where provider numbers are not available we report numbers from leaderboards reporting results on these benchmarks: Humanity's Last Exam results are sourced from https://agi.safe.ai/ and https://scale.com/leaderboard/humanitys_last_exam, LiveCodeBench results are from https://livecodebench.github.io/leaderboard.html (1/1/2025 - 5/1/2025 in the UI), Aider Polyglot numbers come from https://aider.chat/docs/leaderboards/. FACTS come from https://www.kaggle.com/benchmarks/google/facts-grounding. For MRCR v2 which is not publically available yet we include 128k results as a cumulative score to ensure they can be comparable with other models and a pointwise value for 1M context window to show the capability of the model at full length. The methodology has changed in this table vs previously published results for MRCR v2 as we have decided to focus on a harder, 8-needle version of the benchmark going forward.

* these results are on an earlier HLE dataset, obtained from https://scale.com/leaderboard/humanitys_last_exam_preview

Model information

Name

2.5 Flash-Lite

Status

General availability

Input

Output

Input tokens

Output tokens

64k

Knowledge cutoff

January 2025

Tool use

Search as a tool
Code execution

Best for

High volume, low-cost and low latency tasks

Availability

Google AI Studio
Gemini API
Vertex AI

Documentation

View developer docs

Model card

View model card

Technical report

View technical report