Swahili Gemma 1B from Crane AI Labs bridges the gap with powerful on-device performance
Crane AI Labs is building a sovereign, offline-first AI infrastructure for Sub-Saharan Africa, with models designed to bridge the region's language and connectivity gaps. Their first model, Swahili Gemma 1B, focuses on one of many under-represented languages in AI—and one spoken by many with highly limited internet access. Crane AI Labs chose Gemma 3 1B for its optimal balance of performance, size, and memory footprint, making on-device inference a reality.
The challenge
Many areas in Sub-Saharan Africa struggle to adopt AI tools due to the region’s limited internet infrastructure. Crane AI Labs estimates that 3.8 billion people in the region primarily access the internet via low-cost, low-power mobile devices, on which cloud-based AI applications perform unreliably.
Another challenge is a lack of high-quality, culturally specific datasets for regional languages. Generalist models perform poorly on African languages like Swahili, making them difficult to use even for those with a strong internet connection and completely inaccessible for those without.
For Crane AI Lab's co-founders Kato Steven Mubiru and Bakunga Bronson, bridging this language gap is personal. They see AI's potential to unlock a wealth of education resources, but recognize that language and connectivity barriers prevent millions from accessing this knowledge. The team at Crane AI Labs believes their work can play a meaningful part in changing this reality.
The solution
To succeed, the Crane AI Labs team required a model that could function quickly and accurately on low-power, offline devices. After rigorous testing, they found Gemma 3 1B offered the best translation accuracy in Swahili relative to its small size.
Gemma 3 1B provided the perfect balance of size and capability, allowing us to build high-quality models that could actually run on the devices our users own.
Kato Steven Mubiru, Crane AI Labs Co-founder
The developers at Crane AI Labs then fine-tuned Gemma 3 1B, leveraging Unsloth for its memory-efficient QLoRA implementation. They used Crane AI Lab's hybrid dataset, which combines synthetically generated text validated and corrections from local language experts. The final model, Swahili Gemma 1B, was then prepared for its on-device deployment pipeline.
For Crane AI Lab's mobile app, TANO, they used the Google AI Edge Gallery. The team converted it to the mobile-optimized LiteRT format, reducing its memory footprint with minimal accuracy loss. To empower the broader community, a separate PC version was converted to the popular GGUF format and quantized to Q4 and Q8.
All of this work enables easy integration with tools like Llama.cpp, Cactus, Jan AI, LM Studio, and Ollama.
Swahili Gemma 1B performance nears or surpasses that of larger models.
The impact
After fine-tuning, Swahili Gemma 1B significantly outperformed models of similar parameter sizes, achieving high scores on both BLEU (BiLingual Evaluation Understudy) and chrF++ evaluations. Remarkably, Swahili Gemma 1B even rivals much larger models, achieving 94% of the performance of Gemma 3 27B. This represents a massive efficiency gain, delivering near-state-of-the-art performance in a package 27 times smaller.
This incredible performance all happens on-device. As Kato puts it, "the ultimate 'cost-per-inference' is close to zero. Because Gemma is efficient enough to run on-device, we completely eliminate recurring cloud inference costs." According to Kato, this on-device approach may be the only economic model that is sustainable for a free consumer app in Africa. It also dramatically lowers the barrier for entry for enterprise partners looking to adopt these models.
During the TANO app's beta stage, Swahili Gemma 1B and similarly fine-tuned Gemma models have been positively received, attracting partnerships with organizations like MTN, Enabel, Infectious Diseases Institute, Africa is Talking, Ugandan banks, and Ugandan government ministries. Beta testers include developers, teachers, students, health workers, and farmers.
What’s next
Crane AI Labs has a multi-phase strategy to become the leading sovereign AI platform in Sub-Saharan Africa. Following Swahili Gemma 1B, the team has released a Luganda language model that has already achieved impressive benchmarks for its size.
Being able to get help with my homework in Luganda, offline, is a game-changer. It feels like this was built for me.
Student at Makarere University, on Tano app
Over the next two years, they plan to build on the success of these models by further improving their accuracy and performance and by releasing models for other regional African languages like Kinyarwanda and Acholi. "Stop waiting for perfect internet infrastructure," says Bakunga. "The efficiency of models like Gemma means you can build world-class, offline-first AI for your community today."
Beyond Africa, the team is considering a focus on Arabic in the Middle East, where there are similar challenges of linguistic accuracy and internet connectivity. Eventually, the team hopes to bring their offline-first model expertise to Southeast Asia and Latin America.
The team plans to keep Gemma at the core of their models and is experimenting on building multimodal features with Gemma 3n. There is also a desire at Crane AI Labs to go even smaller with their models for more resource-constrained edge use cases.
"I'm passionate about a world where access to education is independent of language," concluded Bakunga. "I believe that through speech-to-speech AI systems we can help deliver education to Africa and many other parts of the world with constrained resources."