Rapid response
Generates content significantly faster than even our fastest model so far.
Large-language models are the foundation of generative AI today. We’re using a technique called diffusion to explore a new kind of language model that gives users greater control, creativity, and speed in text generation.
Traditional autoregressive language models generate text one word – or token – at a time. This sequential process can be slow, and limit the quality and coherence of the output.
Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step-by-step. This means they can iterate on a solution very quickly and error correct during the generation process. This helps them excel at tasks like editing, including in the context of math and code.
Generates content significantly faster than even our fastest model so far.
Generates entire blocks of tokens at once, meaning it responds more coherently to a user’s prompt than autoregressive models.
Corrects errors during generation for more consistent outputs.
| Benchmark | Gemini Diffusion | Gemini 2.0 Flash-Lite |
|---|---|---|
| Code LiveCodeBench (v6) | 30.9% | 28.5% |
| Code BigCodeBench | 45.4% | 45.8% |
| Code LBPP (v2) | 56.8% | 56.0% |
| Code SWE-Bench Verified* | 22.9% | 28.5% |
| Code HumanEval | 89.6% | 90.2% |
| Code MBPP | 76.0% | 75.8% |
| Science GPQA Diamond | 40.4% | 56.5% |
| Mathematics AIME 2025 | 23.3% | 20.0% |
| Reasoning BIG-Bench Extra Hard | 15.0% | 21.0% |
| Multilingual Global MMLU (Lite) | 69.1% | 79.0% |
Methodology
All scores are pass @1 (no majority voting). The Gemini 2.0 Flash-Lite experiments are run with the AI Studio API for the model-id gemini-2.0-flash-lite with the default sampling settings.
* Non-agentic evaluation (single turn edit only), max prompt length of 32K.
Average sampling speed across reported evals.