← Back to feed
7

Mistral Small 4: Unified 119B MoE Model Released Under Apache 2.0

Models2 sources·Mar 17

Summary

  • • Mistral Small 4 unifies reasoning, multimodal, and agentic coding in one model
  • • 119B total parameters with MoE architecture, only 6B active per token
  • • 40% latency reduction and 3x throughput improvement over Mistral Small 3
  • • Fully open source under Apache 2.0 with 256k context window
Adjust signal

Details

1.Product Launch

Mistral Small 4 released as unified flagship open-source model

The model merges capabilities from three previously separate Mistral models: Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding). Released under Apache 2.0, it is free to use, fine-tune, and deploy commercially.

2.Tech Info

MoE architecture: 128 experts, 4 active per token, 119B total / 6B active parameters

Mixture of Experts design means only 4 of 128 experts activate per token, keeping active parameter count at 6B (8B including embedding and output layers). This allows the model to match dense models far above its active-parameter count in capability while remaining inference-efficient.

3.New Tech

Configurable reasoning effort via reasoning_effort parameter

Users can set reasoning_effort to 'none' for fast, direct responses or 'high' for deep chain-of-thought reasoning. This gives developers flexible control over latency versus answer quality on a per-request basis.

4.New Tech

Native multimodality: text and image inputs supported out of the box

Unlike prior small Mistral models, Small 4 natively accepts image inputs alongside text, bringing vision capability to the same model used for reasoning and coding without requiring a separate deployment.

5.Stat

40% reduction in end-to-end completion time versus prior generation

Measured in a latency-optimized deployment setup. Throughput-optimized setups achieve 3x more requests per second compared to Mistral Small 3, making the upgrade significant for production API deployments.

6.Stat

Matches or surpasses GPT-OSS 120B on three benchmarks with shorter outputs

On AA LCR the model scores 0.72 using 1.6K characters, while Qwen requires 3.5-4x more output to reach comparable performance. On LiveCodeBench it outperforms GPT-OSS 120B while producing 20% less output.

7.Infrastructure

Minimum hardware: 4x NVIDIA HGX H100, 2x HGX H200, or 1x DGX B200

These are the floor requirements for self-hosting. The model is compatible with vLLM, llama.cpp, SGLang, Transformers, and HuggingFace, covering the major open-source inference stacks.

8.Partnership

Mistral joins NVIDIA Nemotron Coalition as a founding member

The partnership includes inference optimization collaboration on vLLM and SGLang. Joining as a founding member signals deeper integration with NVIDIA's enterprise AI ecosystem beyond a standard hardware dependency.

9.Strategy

Single model replaces three separate specialized deployments

Previously, teams needing reasoning, vision, and coding support from Mistral would run distinct models. Consolidating into one model simplifies infrastructure, reduces operational overhead, and lowers the barrier for organizations wanting broad AI capability in a self-hosted setup.

Product Launch = new release, Tech Info = architectural detail, New Tech = new capability, Stat = benchmark or performance figure, Infrastructure = hardware/deployment requirement, Partnership = external collaboration, Strategy = business or product positioning

What This Means

Mistral Small 4 is a meaningful step toward open-source models that compete with frontier proprietary systems across a wide range of tasks — reasoning, vision, and coding — without requiring separate deployments. The Apache 2.0 license means any organization can self-host, fine-tune, and commercialize the model freely, which matters most to enterprises with data privacy constraints or cost sensitivity. The efficiency gains from its MoE design make it practical on hardware teams may already own, and benchmark results suggest it punches above its active-parameter weight against larger dense models. This release increases competitive pressure on both proprietary API providers and other open-weight labs.

Sources

Similar Events