Gemini 3.1 Flash-Lite Image
Model Cards are intended to provide essential information on Gemini models, including known limitations, mitigation approaches, and safety performance. Model cards may be updated from time-to-time; for example, to include updated evaluations as the model is improved or revised.
Published: June 2026
Model Information
Description
Gemini 3.1 Flash-Lite Image is a member of the Gemini series of models, a suite of highly-capable, natively multimodal reasoning models. Gemini 3.1 Flash-Lite Image can comprehend input from different information sources, including text, images, audio and video. Image and text output is generated in the response.
Model dependencies
Gemini 3.1 Flash-Lite Image is based on Gemini 3.1 Flash-Lite.
Inputs
Text strings (e.g., a prompt, document(s)) and images, with a token context window of up to 1M.
Outputs
Image, with a 4K token output and text, with a 64K token output.
Architecture
Gemini 3.1 Flash-Lite Image is based on Gemini 3.1 Flash-Lite. For more information about the model architecture for Gemini 3.1 Flash-Lite, see the Gemini 3.1 Flash-Lite model card.
Model Data
Training Dataset
Gemini 3.1 Flash-Lite Image is based on Gemini 3.1 Flash-Lite. For more information about the training dataset for Gemini 3.1 Flash-Lite, see the Gemini 3.1 Flash-Lite model card.
Training Data Processing
For more information about the training data processing for Gemini 3.1 Flash-Lite Image, see the Gemini 3.1 Flash-Lite model card.
Implementation and Sustainability
Hardware
Gemini 3.1 Flash-Lite Image was trained using Google’s Tensor Processing Units (TPUs). TPUs are specifically designed to handle the massive computations involved in training LLMs and can speed up training considerably compared to CPUs. TPUs often come with large amounts of high-bandwidth memory, allowing for the handling of large models and batch sizes during training, which can lead to better model quality. TPU Pods (large clusters of TPUs) also provide a scalable solution for handling the growing complexity of large foundation models. Training can be distributed across multiple TPU devices for faster and more efficient processing.
The efficiencies gained through the use of TPUs are aligned with Google's commitment to operate sustainably.
Software
Training was done using JAX and ML Pathways.
Distribution
Gemini 3.1 Flash-Lite Image is based on Gemini 3.1 Flash-Lite. For more information about the distribution for Gemini 3.1 Flash-Lite, see the Gemini 3.1 Flash-Lite model card.
Evaluation
The following Evaluation approach and results are for Gemini 3.1 Flash-Lite Image. For more information about the evaluation for Gemini 3.1 Flash-Lite, see the Gemini 3.1 Flash-Lite model card.
Approach
Gemini 3.1 Flash-Lite Image was evaluated using the methodology below:
- Capabilities / Benchmarks cover several different quality aspects of image generation, which are in two broad categories:
- Capability sets: diverse Text-to-Image (T2I) and editing evals are curated, covering a wide range of capabilities.
- T2I: General Text-to-Image, i18n Text Rendering, Visual Design.
- Editing: General Image Editing, Stylization, Character Editing, Object/Environment Editing, Factuality (world knowledge, EDU, etc.), Ink (doodle) based Editing, Multi-Image (Multi-Product Recontextualization, Multi-Character, etc.),
- Multi-Turn: Covers both static and dynamic conversations.
- Regression sets: popular use cases observed on Gemini 2.5 Flash Image, Gemini 3 Pro Image and Gemini 3.1 Flash Image, to ensure Gemini 3.1 Flash-Lite Image does not show any noticeable regressions.
- Capability sets: diverse Text-to-Image (T2I) and editing evals are curated, covering a wide range of capabilities.
- Eval Methodology
- SxS human eval to get Elo across diverse T2I, Editing.
- SxS human eval to get win rates for Multi-Turn.
- Single sided AutoRater on factuality, style diversity of non-natural images.
Results
Results for Gemini 3.1 Flash-Lite Image are below.
Capabilities: Text-to-Image
| Capability Benchmark | Gemini 3.1 Flash-Lite Image (Thinking) | Gemini 3.1 Flash-Lite Image (No thinking) | Gemini 3.1 Flash Image (“Nano Banana 2”) | Gemini 3 Pro Image (“Nano Banana Pro”) | Gemini 2.5 Flash Image (“Nano Banana”) | GPT 2 Response API Low | Grok Imagine Image Pro | Flux 2 Pro | Seadream v5 Lite 3k | Hunyuan v3 |
|---|---|---|---|---|---|---|---|---|---|---|
| General T2I | 1059.0 ± 7.0 | 1055.0 ± 6.0 | 1080.0 ± 6.0 | 1018.0 ± 5.0 | 929.0 ± 6.0 | 1122.0 ± 6.0 | 970.0 ± 7.0 | 908.0 ± 7.0 | 933.0 ± 7.0 | 804.0 ± 8.0 |
| Visual Design | 1047.0 ± 9.0 | 1027.0 ± 8.0 | 1066.0 ± 9.0 | 1009.0 ± 7.0 | 919.0 ± 9.0 | 1150.0 ± 12.0 | 1018.0 ± 11.0 | 957.0 ± 11.0 | 936.0 ± 11.0 | 704.0 ± 15.0 |
| Social Media Trends - T2I | 1030.0 ± 13.0 | 1056.0 ± 16.0 | 1050.0 ± 14.0 | 1040.0 ± 12.0 | 924.0 ± 12.0 | 1113.0 ± 19.0 | 1022.0 ± 15.0 | 936.0 ± 16.0 | 916.0 ± 18.0 | 831.0 ± 20.0 |
Capabilities: Editing
| Capability Benchmark | Gemini 3.1 Flash-Lite Image (Thinking) | Gemini 3.1 Flash-Lite Image (No thinking) | Gemini 3.1 Flash Image (“Nano Banana 2”) | Gemini 3 Pro Image (“Nano Banana Pro”) | Gemini 2.5 Flash Image (“Nano Banana”) | GPT 2 Response API Low | Grok Imagine Image Pro | Flux 2 Pro | Seadream v5 Lite 3k | Hunyuan v3 |
|---|---|---|---|---|---|---|---|---|---|---|
| General Editing | 983.0 ± 9.0 | 954.0 ± 9.0 | 1062.0 ± 8.0 | 1054.0 ± 7.0 | 893.0 ± 8.0 | 1122.0 ± 12.0 | 1007.0 ± 11.0 | 876.0 ± 12.0 | 942.0 ± 10.0 | 931.0 ± 10.0 |
| Obj/Env Editing | 996.0 ± 9.0 | 993.0 ± 8.0 | 1041.0 ± 7.0 | 1059.0 ± 9.0 | 978.0 ± 9.0 | 1069.0 ± 14.0 | 996.0 ± 11.0 | 875.0 ± 11.0 | 957.0 ± 9.0 | 1013.0 ± 10.0 |
| Character Editing | 1026.0 ± 6.0 | 1018.0 ± 5.0 | 1044.0 ± 6.0 | 1049.0 ± 6.0 | 924.0 ± 5.0 | 1054.0 ± 8.0 | 1015.0 ± 8.0 | 842.0 ± 7.0 | 914.0 ± 7.0 | 936.0 ± 7.0 |
| Stylization | 972.0 ± 8.0 | 953.0 ± 8.0 | 1046.0 ± 9.0 | 1054.0 ± 7.0 | 880.0 ± 10.0 | 1030.0 ± 7.0 | 948.0 ± 9.0 | 1012.0 ± 8.0 | 1002.0 ± 7.0 | 1067.0 ± 9.0 |
| Multi Input (up to 5) | 980.0 ± 8.0 | 974.0 ± 9.0 | 1045.0 ± 8.0 | 1044.0 ± 7.0 | 907.0 ± 8.0 | 1121.0 ± 10.0 | 973.0 ± 10.0 | 889.0 ± 11.0 | 946.0 ± 9.0 | 940.0 ± 10.0 |
| Social Media Trends - Editing | 996.0 ± 13.0 | 968.0 ± 15.0 | 1036.0 ± 12.0 | 1025.0 ± 11.0 | 1009.0 ± 11.0 | 1120.0 ± 20.0 | 1031.0 ± 14.0 | 898.0 ± 16.0 | 919.0 ± 13.0 | 921.0 ± 13.0 |
| Text Editing | 961.0 ± 10.0 | 968.0 ± 10.0 | 1107.0 ± 10.0 | 1076.0 ± 8.0 | 823.0 ± 11.0 | 1178.0 ± 16.0 | 1060.0 ± 11.0 | 808.0 ± 13.0 | 968.0 ± 9.0 | 941.0 ± 10.0 |
| Multi Character (up to 5) Editing | 1020.0 ± 8.0 | 1034.0 ± 9.0 | 1103.0 ± 8.0 | 1135.0 ± 10.0 | 802.0 ± 10.0 | 1122.0 ± 11.0 | 1010.0 ± 10.0 | 779.0 ± 13.0 | 930.0 ± 11.0 | 861.0 ± 11.0 |
| Doodle Editing | 990.0 ± 7.0 | 985.0 ± 7.0 | 1080.0 ± 8.0 | 1043.0 ± 7.0 | 959.0 ± 8.0 | 1098.0 ± 13.0 | 938.0 ± 9.0 | 983.0 ± 9.0 | 1041.0 ± 9.0 | 883.0 ± 9.0 |
| Multi Product (up to 14) Editing | 1029.0 ± 8.0 | 1024.0 ± 9.0 | 1084.0 ± 10.0 | 1098.0 ± 9.0 | 940.0 ± 8.0 | 1122.0 ± 14.0 | 967.0 ± 10.0 | 885.0 ± 11.0 | 971.0 ± 10.0 | 900.0 ± 10.0 |
Intended Usage and Limitations
Benefit and Intended Usage
Gemini 3.1 Flash-Lite Image is capable of using Gemini’s real-world knowledge to deliver precise results and reflect the world around you, from complex infographics to historically accurate scenes. It is well-suited for applications that require:
- creation and editing of images with professional levels of precision and control and multiple, quick iterations
- generation of clear text for posters and intricate diagrams
- long context real-world knowledge
- localized text rendering across several languages
- studio-quality control
Known Limitations
Gemini 3.1 Flash-Lite Image may exhibit some of the general limitations of foundation models, such as hallucinations. There may also be occasional slowness or timeout issues.
Gemini 3.1 Flash-Lite Image still has room for several quality improvements:
- Text rendering: poor in small text (often blurry in 1k model), long paragraphs, page length
- Character consistency is not always perfect between input images and generated output image
- Masked/Doodle based editing: partial instruction following and persistent ink
- When editing images: infrequent copying/pasting from user's input image to generated image
- Occasional confusion around spatial localisation (e.g. left/right etc.)
- Still limited in advanced capabilities with world knowledge, 3D reasoning and factuality
The knowledge cutoff date for Gemini 3.1 Flash-Lite Image was January 2025.
Acceptable Usage
For more information about the acceptable usage for Gemini 3.1 Flash-Lite Image, see the Gemini 3.1 Flash-Lite model card.
Ethics and Content Safety
Evaluation Approach
Gemini 3.1 Flash-Lite Image was developed in partnership with internal safety, and responsibility teams. A range of evaluations and red teaming activities were conducted to help improve the model and inform decision-making. These evaluations and activities align with Google's AI Principles and responsible AI approach, as well as Google's Generative AI policies (e.g. Gen AI Prohibited Use Policy and the Gemini API Additional Terms of Service). As Gemini 3.1 Flash-Lite Image is based on Gemini 3.1 Flash-Lite, see the Gemini 3.1 Flash-Lite model card for additional Ethics & Content Safety details.
Evaluation types included but were not limited to:
- Training/Development Evaluations including automated and human evaluations carried out continuously throughout and after the model’s training, to monitor its progress and performance;
- Human Red Teaming conducted by specialist teams across the policies and desiderata, deliberately trying to spot weaknesses and ensure the model adheres to safety policies and desired outcomes;
- Trust Assurance Evaluations conducted by evaluators who sit outside of the model development team, used to independently assess responsibility and safety governance decisions;
- Ethics & Safety Reviews were conducted ahead of the model’s release.
Safety Policies
Gemini’s safety policies are based on Google’s standard framework, which aim to prevent our Generative AI models from generating harmful content, including:
- Content related to child sexual abuse material and exploitation
- Hate speech (e.g. dehumanizing members of protected groups)
- Dangerous content (e.g. promoting suicide, or instructing in activities that could cause real-world harm)
- Harassment (e.g. encouraging violence against people)
- Sexually explicit content
- Medical advice that runs contrary to scientific or medical consensus
We continue to improve our internal evaluations, including refining automated evaluations to reduce false positives and negatives, as well as update query sets to ensure balance and maintain a high standard of results.
Frontier Safety Assessment
Gemini 3.1 Flash-Lite Image is part of the Gemini 3 family of models. We evaluated Gemini 3.1 Pro for frontier safety as it was the most generally capable model as of publication of this model card, and it did not reach any Critical Capability Levels (CCLs) outlined in our Frontier Safety Framework. Our assessments have shown that Gemini 3.1 Flash-Lite Image is less capable than Gemini 3.1 Pro, therefore based on Gemini 3.1 Pro, we are confident that that Gemini 3.1 Flash-Lite Image is also unlikely to reach any CCLs. For more information, read the Gemini 3.1 Pro Model Card.
Risks and Mitigations
Safety and responsibility was built into Gemini 3.1 Flash-Lite Image throughout the training and deployment lifecycle, including pre-training, post-training, and product-level mitigations. Mitigations include, but are not limited to:
- dataset filtering;
- conditional pre-training;
- supervised fine-tuning;
- reinforcement learning from human and critic feedback;
- safety policies and desiderata;
- product-level mitigations such as safety filtering.