Ultra
Introducing
Gemini 1.5
Our next-generation model
Gemini 1.5 delivers dramatically enhanced performance with a more efficient architecture. The first model we’ve released for early testing, Gemini 1.5 Pro, introduces a breakthrough experimental feature in long-context understanding.
Gemini comes in three model sizes
Our most capable and largest model for highly-complex tasks.
Pro
Our best model for scaling across a wide range of tasks.
Nano
Our most efficient model for on-device tasks.
Meet the first version of Gemini— our most capable AI model.
Gemini 1.0 Ultra
90.0%
89.8%
Human expert(MMLU)
86.4%
5-shot* (reported)
Previous SOTA (GPT-4)
*Note that evaluations of previous SOTA models use different prompting techniques.
Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.
Gemini 1.0 Ultra surpasses state-of-the-art performance on a range of benchmarks including text and coding.
TEXT
TEXT | Capability | Benchmark Higher is better | Description | Gemini 1.0 Ultra | GPT-4API numbers calculated where reported numbers were missing | |
---|---|---|---|---|---|---|
General | MMLURepresentation of questions in 57 subjects (incl. STEM, humanities, and others) | Representation of questions in 57 subjects (incl. STEM, humanities, and others) | 90% | 86.4% | ||
Reasoning | Big-Bench HardDiverse set of challenging tasks requiring multi-step reasoning | Diverse set of challenging tasks requiring multi-step reasoning | 83.6% | 83.1% | ||
DROPReading comprehension (F1 Score) | Reading comprehension (F1 Score) | 82.4 | 80.9 | |||
HellaSwagCommonsense reasoning for everyday tasks | Commonsense reasoning for everyday tasks | 87.8% | 95.3% | |||
Math | GSM8KBasic arithmetic manipulations (incl. Grade School math problems) | Basic arithmetic manipulations (incl. Grade School math problems) | 94.4% | 92% | ||
MATHChallenging math problems (incl. algebra, geometry, pre-calculus, and others) | Challenging math problems (incl. algebra, geometry, pre-calculus, and others) | 53.2% | 52.9% | |||
Code | HumanEvalPython code generation | Python code generation | 74.4% | 67% | ||
Natural2CodePython code generation. New held out dataset HumanEval-like, not leaked on the web | Python code generation. New held out dataset HumanEval-like, not leaked on the web | 74.9% | 73.9% |
Our Gemini 1.0 models surpass state-of-the-art performance on a range of multimodal benchmarks.
MULTIMODAL
MULTIMODAL | Capability | Benchmark | Description Higher is better unless otherwise noted | Gemini | GPT-4VPrevious SOTA model listed when capability is not supported in GPT-4V | |
---|---|---|---|---|---|---|
Image | MMMUMulti-discipline college-level reasoning problems | Multi-discipline college-level reasoning problems | 59.4% | 56.8% | ||
VQAv2Natural image understanding | Natural image understanding | 77.8% | 77.2% | |||
TextVQAOCR on natural images | OCR on natural images | 82.3% | 78% | |||
DocVQADocument understanding | Document understanding | 90.9% | 88.4% | |||
Infographic VQAInfographic understanding | Infographic understanding | 80.3% | 75.1% | |||
MathVistaMathematical reasoning in visual contexts | Mathematical reasoning in visual contexts | 53% | 49.9% | |||
Video | VATEXEnglish video captioning (CIDEr) | English video captioning (CIDEr) | 62.7 | 56 | ||
Perception Test MCQAVideo question answering | Video question answering | 54.7% | 46.3% | |||
Audio | CoVoST 2 (21 languages)Automatic speech translation (BLEU score) | Automatic speech translation (BLEU score) | 40.1 | 29.1 | ||
FLEURS (62 languages)Automatic speech recognition (based on word error rate, lower is better) | Automatic speech recognition (based on word error rate, lower is better) | 7.6% | 17.6% |
The potential of Gemini
Learn about what our Gemini models can do from some of the people who built it.
Gemma Open Models
A family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models.