Gemma Scope

A set of interpretability tools built to help researchers understand the inner workings of Gemma models.

Examine individual model layers to help address critical concerns including hallucinations, biases, and manipulation

Sparse autoencoders (SAEs) act as microscopes to inspect layer-specific representations and help pinpoint the source of issues.

With Gemma Scope 2, researchers can use transcoders to analyze complex, multi-step behaviors, from diagnosing jailbreaks and refusal mechanisms to verifying faithfulness to chain-of-thought reasoning.

This visual shows Gemma Scope 2 using sparse autoencoders and transcoders to show researchers how the model is determining a potential fraudulent email.

To understand the AI’s thought process, Gemma Scope deconstructs the Gemma model into millions of individual components known as features. A feature is essentially a detector that lights up whenever the AI recognizes a specific concept. For instance, if you input the phrase 'I like cats,' it triggers the specific feature responsible for identifying words related to cats.