Google Launches Gemma 4: Four Open Models with Frontier-Level Efficiency
Summary
- • Gemma 4 released in four sizes — E2B, E4B, 26B MoE, 31B Dense — under Apache 2.0 license
- • 31B model ranks #3 and 26B ranks #6 among all open models on Arena AI text leaderboard
- • All models support multimodal input (video, images); edge variants add native audio input
- • Context windows up to 256K tokens; natively trained on 140+ languages
- • Gemma ecosystem surpasses 400 million downloads and 100,000 community variants
Details
Gemma 4 released in four sizes under Apache 2.0
Google released Gemma 4 in E2B, E4B, 26B MoE, and 31B Dense configurations. The Apache 2.0 license allows commercial use and modification without restrictions, making these models broadly accessible for enterprise and research deployment.
31B model ranks #3 and 26B ranks #6 among open models globally
Rankings are based on the Arena AI text leaderboard as of April 1, 2026. Google states the 26B MoE model outperforms models with 20x more parameters, underscoring the efficiency gains achieved in this generation.
Native agentic workflow support across the entire model family
All Gemma 4 models include native function-calling, structured JSON output, and system instruction handling. These features enable developers to build autonomous agents that interact with external APIs and execute multi-step workflows without additional scaffolding.
Full multimodal support including video, images; audio on edge models
All four models process images and video at variable resolutions, with particular strength in OCR and chart understanding. The E2B and E4B edge models additionally support native audio input for speech recognition and understanding.
Context windows of 128K (edge) and 256K (larger models)
The E2B and E4B models support 128K token context windows for long documents and extended conversations. The 26B and 31B models extend this to 256K tokens, enabling full repository ingestion or book-length documents in a single prompt.
Models natively trained on 140+ languages
Gemma 4's multilingual training coverage is designed to support global application development without relying on translation layers, particularly relevant for developers building localized products in lower-resource languages.
Gemma ecosystem: 400M+ downloads and 100,000+ community variants
These figures span all Gemma generations since the first release. The scale of community adoption — fine-tunes, quantizations, and derivative models — reflects the practical utility of the open-weight format for downstream customization.
Yale University used Gemma for Cell2Sentence-Scale cancer therapy research
Google cited this collaboration as a concrete example of high-impact fine-tuning: the project explored new pathways for cancer therapy using the model's ability to process and reason over biological data.
Gemma 4 shares research lineage with Gemini 3
Google positions Gemma 4 as built from the same underlying research and technology as its flagship Gemini 3 proprietary models, suggesting architectural and training advances flow from the closed to the open model line.
Edge-first sizing targets Android devices and laptop GPUs
The E2B and E4B models are explicitly designed for billions of Android devices and consumer hardware, reflecting a strategy to embed Gemma into on-device applications where latency, privacy, and cost make cloud inference impractical.
Product Launch = new model/product release; Stat = quantitative claim or ranking; New Tech = novel capability or technical feature; Tech Info = technical specification; Research = academic or scientific application; Context = background framing; Strategy = business or product positioning intent
What This Means
Gemma 4 raises the ceiling for what developers can deploy on commodity hardware — a 31B model ranking third among all open models globally means teams can now access near-frontier reasoning without cloud dependency or large GPU clusters. The inclusion of native agentic features, long context, and full multimodal support across all four sizes makes this a strong foundation for production AI applications beyond research use. With Apache 2.0 licensing and 400 million downloads already behind the Gemma brand, this release is positioned to become a significant reference point in open model benchmarking and community development through 2026.
Sentiment
Broadly excited about local efficiency and benchmark performance
“Google Gemma 4 is here - and it delivers 🤯 Here's HOW TO run it on your hardware (runs on most devices) with llama.cpp to give you a Chat UI + OpenAI chat completion endpoint instantly!”
“Gemma 4 is here! The best open-source model you can run on your machine. Day-0 support in a llama.cpp. Check it out!”
“Gemma-4-31B is now live in Text Arena - ranking #3 among open models (#27 overall), matching much larger models at 10× smaller scale! ... Congrats to @GoogleDeepMind on a major step forward for open models!”
“Qwen 3.5 27B is still the top scoring model in this category of memory usage... I don't believe there is currently any hint at Gemma 4 being currently the top local inference model.”
Split
~90/10 positive/skeptical — most praise efficiency and benchmarks, minor doubts on topping rivals like Qwen for local inference.
