Gemma 3n acts as the core engine behind CastFox’s semantic discovery, smart highlights, and contextual chat features
Guru Network Limited, a global entertainment and game company, built the CastFox platform to transform how users engage with podcasts. The app achieved 1 million downloads in the first three weeks.
By turning passive, long-form audio into an interactive knowledge base, they sought to move beyond simple playback. To achieve this, the CastFox engineering team needed a robust, efficient AI model to fuel its semantic discovery, smart highlights, and in-episode chats. They chose Gemma 3n to power these core features, allowing them to rapidly scale their vision on a startup budget.
Gemma 3n enables semantic discovery, smart highlights, and in-episode chats in the CastFox app.
The CastFox team aimed to shift the podcast experience from isolated, episode-specific listening to richer, topic-level exploration. That meant allowing users to search by concepts by meaning (not just keywords) across the app’s podcast catalog, generate quick topic overviews before diving into fuller episodes, and even have interactive conversations with the audio itself.
To make this goal a reality, the team needed a scalable, cost-effective way to process massive volumes of audio in English, Japanese, and Korean. This processing was essential to generating the summaries, highlights, and Q&A pairs that power the app's main features. Early tests revealed that many API-based models were either cost-prohibitive at their scale or lacked the necessary flexibility for the team’s specific multilingual use case.
After evaluating large proprietary models like GPT and Claude, as well as Whisper pipelines, CastFox developers adopted Gemma 3n E4B for its optimal balance of precision, speed, and value. Gemma's ability to handle multilingual content and reliably return structured outputs like JSON were key factors in the team’s decision. Its open nature also made it a strong fit: Because the model is low-cost and easy to self-host, developers could prototype and refine quickly. “Gemma’s low cost and easy deployment let teams iterate fast and scale affordably,” said Chong Wu, Head of AI at Guru Network Limited. “It’s a great entry point for real-world AI integration.”
The team runs Gemma 3n using Ollama on lightweight GPU and CPU infrastructure, a setup that avoids the high-cost and complexity of model retraining. Instead of fine-tuning, developers discovered they could achieve high accuracy through prompt engineering and light post-processing, such as validating JSON schemas and smoothing timestamps.
Gemma 3n is efficient, open, and performs well across languages—ideal for startups building AI-rich applications.
This setup now powers CastFox's entire audio understanding backend. During preprocessing, an episode is transcribed and then parsed to generate summaries, auto-segments, and candidate Q&A pairs, all of which are stored in JSON for later retrieval. In this way, Gemma 3n serves as the core engine behind CastFox’s semantic search and interactive features: “AI lets us turn passive media like podcasts into active learning experiences,” said Wu. “Without it, CastFox would just be another basic podcast player.”
Gemma 3n delivers robust comprehension across English, Korean, and Japanese while maintaining fast processing times: A 30-second audio clip takes ~40 seconds, 300-400 character text summaries take ~6 seconds, and generating recommended questions from long text takes ~12 seconds.
Gemma 3n provides strong comprehension across English, Korean, and Japanese while keeping responses fast and consistent—impressive for a compact model.
The self-hosted approach is highly cost-effective. At just ~$0.0007 per request, the team can easily process content at scale. To do so, they run preprocessing on AWS Spot Instances using preemptible capacity, instead of more expensive reserved nodes—bringing their representative rate to just around $10/day.
User engagement with the AI features has been strong, confirmed by high retention metrics and positive user reviews highlighting the “chat with podcasts” functionality. Backed by Gemma, CastFox surpassed 1 million downloads in its first 3 weeks, attracting a highly active and enthusiastic user base.
Wu believes the results speak for themselves: “Gemma 3n shows that smaller, open models can deliver real impact.”