PaliGemma 2
A family of lightweight, open, vision-language models that can interpret text and image inputs
Download PaliGemma 2
Combining the SigLIP-So400m vision encoder with Gemma 2, PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes and trained at multiple resolutions to provide broad knowledge for transfer via fine-tuning.
Watch
Model categories
-
PaliGemma PT
General purpose pre-trained models that can be fine-tuned on a variety of tasks.
-
PaliGemma FT
Research-oriented models that are fine-tuned on specific research datasets.
-
PaliGemma 2 mix
Models tuned to a mixture of tasks that can be used out-of-the-box for common use cases.
Capabilities
-
Multimodal input
Capable of answering questions about images or short videos with details and context.
-
Versatile base models
Supports fine-tuning across various sizes and resolutions for tailored vision-language capabilities.
-
Off-the-shelf exploration
Comes with a checkpoint fine-tuned on a mixture of specialized tasks for immediate use.