Hirundo achieves superior security hardening against adversarial vulnerabilities with Gemma 4

Trained with weight-level defenses, Hirundo’s E4B variant delivers elite-tier protection that outclasses models over 100 times its size.

As Large Language Models (LLMs) move into production, prompt injection attacks—where adversaries manipulate inputs to override system instructions—remain a persistent security challenge. Traditional defenses often rely on “bigger is better” logic or fragmented guardrails.

Hirundo, an AI safety platform, challenged this assumption by building advanced, weight-level resistance directly onto the foundational architecture of Gemma 4. As a result, they demonstrated that a compact model can outperform raw scale, delivering production-grade security without sacrificing the speed and cost-efficiency of a smaller footprint.

Adversarial robustness at scale

Enterprises deploying LLMs face a difficult trade-off between model capability, cost, and security. While models over 100B+ parameters are often assumed to be more robust due to their scale, they remain susceptible to sophisticated “jailbreak” techniques that bypass safety training. Alternatively, while smaller models are appreciated for their efficiency, developers historically faced challenges implementing defense layers without sacrificing general utility.

Hirundo sought to prove that security is not a function of parameter count, but of the ability to apply precise behavioral control to an established, efficient model architecture. By targeting the specific weights susceptible to adversarial manipulation, they create a “secure-by-design” model without the latency or compute costs of massive architectures.

Preserving utility while hardening security

Instead of adding external filters that slow down inference and often follow rule-based logic, Hirundo applied structural safety alignment directly to the instruction-tuned base model. This process involves identifying and excising the internal representations that make the model comply with adversarial prompts, effectively “forgetting” susceptibility to prompt injection at the weight level. Gemma 4 E4B IT provided an optimal combination of performance, size, and baseline safety alignment, enabling rapid, secure iteration in resource-constrained environments.

A common fear with aggressive security hardening is the alignment tax, an expected degradation in general performance capabilities. Hirundo’s weight-optimization process achieved a 74.47% reduction in successful attacks relative to the base model, resulting in a final Attack Success Rate (ASR) of 4.78%. Crucially, this hardening strictly preserved the model’s high performance across standard utility benchmarks, including AIME25, LiveCodeBench, GPQA, IFBench, and SCICode, as well as benign benchmarks AutoPatchBench and CyberSOCEval.

Outperforming 600B+ models: efficiency beats scale

To validate the approach, Hirundo benchmarked their hardened Gemma model against industry-leading open-weights models using PurpleLlama CyberSecEval, a responsible AI benchmark suite that evaluates cybersecurity risks. Significantly larger models complied with adversarial overrides at a higher rate, failing to preserve their original system instructions under identical pressure.

The data highlights a critical security insight: raw scale alone offers little protection against targeted jailbreaks. DeepSeek V3.2-Exp, a 685B parameter model, exhibited a 73.33% failure rate—15.6x worse than the hardened Gemma model. Similarly, despite being 30 times larger, GPT-OSS-120B lagged behind with more than 3x the attack success rate, while the 235B Qwen model proved 10.8x more vulnerable.

By pairing the efficiency of Gemma with targeted security hardening, developers can now deploy models that are not only faster and cheaper to run but fundamentally more secure than larger models.

Download model

More from the Gemmaverse

View all

The Ministry of Economy, Ecology and Agriculture of Ukraine digitizes licensing process with Gemma

Learn more

Adaptive ML trains Gemma 3 for exceptional multilingual results

Learn more

Quarks improves user experiences with Gemma 2 and Gemma 3

Learn more

Sarvam AI built a translation model with Gemma 3 to translate all 22 officially recognized Indian languages

Learn more

Institute of Science Tokyo creates powerful Japanese-focused LLM with Gemma 2

Learn more

Introducing GAIA, a Brazilian Portuguese Gemma 3 model developed with ABRIA, CEIA, Nama, and Amadeus AI

Learn more

Explore our next generation AI systems

Our latest AI breakthroughs and updates from the lab

Unlocking a new era of discovery with AI

Our mission is to build AI responsibly to benefit humanity

Hirundo achieves superior security hardening against adversarial vulnerabilities with Gemma 4

Adversarial robustness at scale

Preserving utility while hardening security

Outperforming 600B+ models: efficiency beats scale

More from the Gemmaverse

The Ministry of Economy, Ecology and Agriculture of Ukraine digitizes licensing process with Gemma

Adaptive ML trains Gemma 3 for exceptional multilingual results

Quarks improves user experiences with Gemma 2 and Gemma 3

Sarvam AI built a translation model with Gemma 3 to translate all 22 officially recognized Indian languages

Institute of Science Tokyo creates powerful Japanese-focused LLM with Gemma 2

Introducing GAIA, a Brazilian Portuguese Gemma 3 model developed with ABRIA, CEIA, Nama, and Amadeus AI