Flexible reasoning levels
Delivers improved reasoning and output quality, allowing users to select the level of thinking they want to use.
Introducing 3.1 Flash-Lite, a scalable thinking model for high-volume tasks at low cost and latency.
Delivers improved reasoning and output quality, allowing users to select the level of thinking they want to use.
Tackles high-volume tasks with faster response times.
Delivers high throughput with quality using search grounding and enhanced instruction following.
Our most cost efficient model yet in the 3 series.
Gemini 3.1 Flash-Lite populates an e-commerce wireframe with hundreds of products across multiple categories in seconds.
Gemini 3.1 Flash-Lite builds live weather dashboards on demand, pulling real-time forecasts and historical data into dynamic visualizations.
Google’s model has demonstrated unparalleled instruction-following capabilities and speed in its class, achieving 20% higher success rate and 60% faster inference times than our previous model. It's enabling Latitude to deliver sophisticated storytelling to a much wider audience than would have otherwise been possible.
3.1 Flash-Lite is a remarkably competent model. It is lightning fast, but still somehow finds a way to follow all instructions. It is great at tool calling and can rapidly explore codebases in a fraction of the time of bigger models. We have a wide variety of multimodal labeling use cases, at dramatic scale, we’ve found Flash-Lite to be an unlock for our ability to bring insight to more data at even larger scale. The intelligence to speed ratio is unparalleled in any other model.
By integrating 3.1 Flash-Lite into our classification pipeline, Whering has achieved 100% consistency in item tagging, providing a highly reliable foundation for our label assignment. 3.1 Flash-Lite’s ability to deliver certain, repeatable results, even on complex fashion categories, has streamlined our data labelling process and increased our confidence in the structured outputs.
As a root orchestration and content engine 3.1 Flash-Lite consistently delivered sub-10 second completions with near-instant streaming, ~97% structured output compliance, and 94% intent routing accuracy. For high-throughput AI products, it offers an exceptional balance of speed, instruction fidelity, and cost efficiency.
| Benchmark | Notes |
Gemini 3.1 Flash-Lite High |
Gemini 2.5 Flash Dynamic |
Gemini 2.5 Flash-Lite Dynamic |
GPT-5 mini High |
Claude 4.5 Haiku Extended Thinking |
Grok 4.1 Fast Reasoning |
|---|---|---|---|---|---|---|---|
| Input price $/1M tokens, no caching | Lower is better | $0.25 | $0.30 | $0.10 | $0.25 | $1.00 | $0.20 |
| Output price $/1M tokens | Lower is better | $1.50 | $2.50 | $0.40 | $2.00 | $5.00 | $0.50 |
| Output speed Tokens / s | 363 | 249 | 366 | 71 | 108 | 145 | |
| Humanity’s Last Exam Academic reasoning (full set, text + MM) | No tools | 16.0% | 11.0% | 6.9% | 16.7% | 9.7% | 17.6% |
| GPQA Diamond Scientific knowledge | No tools | 86.9% | 82.8% | 66.7% | 82.3% | 73.0% | 84.3% |
| MMMU-Pro Multimodal understanding and reasoning | No tools | 76.8% | 66.7% | 51.0% | 74.1% | 58.0% | 63.0% |
| CharXiv Reasoning Information synthesis from complex charts | 73.2% | 63.7% | 55.5% | 75.5% (+ python) | 61.7% | 31.6% | |
| Video-MMMU Knowledge acquisition from videos | 84.8% | 79.2% | 60.7% | 82.5% | — | 74.6% | |
| SimpleQA Verified Parametric knowledge | 43.3% | 28.1% | 11.5% | 9.5% | 5.5% | 19.5% | |
| FACTS Benchmark Suite Factuality benchmark across grounding, parametric, search, and MM. | 40.6% | 50.4% | 17.9% | 33.7% | 18.6% | 42.1% | |
| MMMLU Multilingual Q&A | 88.9% | 86.6% | 84.5% | 84.9% | 83.0% | 86.8% | |
| LiveCodeBench Code generation (UI: 1/1/2025-5/1/2025) | 72.0% | 62.6% | 34.3% | 80.4% | 53.2% | 76.5% | |
| MRCR v2 (8-needle) Long context performance | 128k (average) | 60.1% | 54.3% | 30.6% | 52.5% | 35.3% | 54.6% |
| 1M (pointwise) | 12.3% | 21.0% | 5.4% | Not supported | Not supported | 6.1% |