Gemma 3 4B helps Levro build a model training platform with lower inference costs
Neobank Levro enables businesses to easily build efficient domain-specific models with Gemma
Levro, a multi-currency neobank for businesses, is evolving into a platform that aids in the training of domain-specific models. That shift began when the fintech wanted to deploy its own AI banking agent to assist its customers. Dissatisfied with the results delivered by other state-of-the-art (SOTA) models, the team decided to base their agent on Gemma in order to optimize performance and reduce cost.
Informed by that experience, Levro wanted to simplify the development and training of models for businesses without machine learning expertise. This led to the creation of Levro L-1, a new platform that enables businesses to build custom, private AI solutions that Levro found to be faster and more efficient than other open models, powered by Gemma 3 4B.
The challenge
Many businesses could benefit from integrating AI into their products or workflows but lack the necessary skills or experience to do so. Even with the right personnel, the resource and time commitment for in-house AI model development—from selection and training to deployment—can be prohibitive for organizations requiring rapid deployment.
Levro set out to develop an AI support agent for its banking business and tested many SOTA open models that tried to address these issues. But the developers found that the models struggled to handle natural language requests as well as more complex and domain-specific tasks, such as calling APIs to provide customers with information about their accounts. "The models often ignored function parameter descriptions, called the wrong functions, or improperly formatted arguments," said Cathy Han, Levro co-founder and CEO. Han also cited latency and cost as issues, calling the results they received "unacceptable."
After fine-tuning with Gemma, the model outperformed other SOTA models across internal benchmarks all while running faster and achieving a 98% cost reduction for our fintech agent.
Cathy Han, Levro co-founder & CEO
The solution
Levro discovered that Gemma 3 4B enabled it to overcome many of these barriers after fine-tuning, which involved reinforcement learning on Levro's APIs and natural language processing. This training led to a smaller, domain-specific model that can handle more complex requests. By operating on-premise, the agent also enhances data security and dramatically improves response times. Perhaps the biggest gain came in how much Gemma could reduce compute costs. Prices with other models can reach as high as $5 per million tokens, but Levro found that Gemma 3 4B cost only 31 cents per million tokens and delivered results faster.
Following their success with the agent, the team set out to create Levro L-1, a full-stack platform to allow developers to build sophisticated agentic applications without requiring deep AI expertise. Once again, the team tested Gemma 3 4B against other SOTA open models and found that Gemma delivered better accuracy and speed while still remaining lighter and more cost-efficient.
Levro L-1 uses reinforcement learning to train models on its client’s domain-specific data, and APIs to perform complex, specialized tasks like agentic customer support, technical solutions engineering, or business intelligence querying. Reinforcement learning was done using the Hugging Face TRL library along with Unsloth, which developers patched to make changes to the GRPO algorithm. Levro uses vLLM for inferencing.
The impact
Thanks to the reinforcement learning on domain-specific data, Levro found that its models are smaller, more accurate, and better fit the client’s unique use case, making them more cost-effective than if general-purpose models had been utilized. The models delivered by Levro L-1 also offer a great deal of security to its clients. Because Gemma is an open model, clients can run it privately and securely on-premises to protect sensitive data and control API access.
The biggest impact Gemma had for Levro L-1 over other SOTA open models was in cost-per-inference efficiency. With Gemma, Levro L-1 can train a model for its clients for less than $500 in fewer than 24 hours—an incredible value to businesses without the skills or means to train their own model.
What’s next
Levro is enhancing Levro L-1 with advanced training techniques to improve its performance on complex queries. This approach boosts capability without increasing costs, opening the door for more sophisticated use cases. The team has also been testing how Gemma 3 12B can be used for more advanced training.
As Levro continues to improve Levro L-1, the team intends to keep Gemma at its core. "Gemma will be a key part of the reinforcement learning platform we provide to our users," said Han. "We plan to use it as our default base model for future reinforcement learning runs."