Join the reliable event by business leaders for almost two decades. VB Transform brings together people who build a strategy of real business. Get more information
The French dear of the Mistral maintains the new releases this summer.
A few days after announcing its own cloud service of the Optimized Mistral Compute, the well-financed company has launched an update of its open source source model 24b Mistral Small, jumping from a 3.1 to 3.2-24b Instruct-10506 launch.
The new version is based directly on Mistral Small 3.1, with the aim of improving specific behaviors, such as the next instruction, the stability of exit and the function that calls for robustness. Although general architectural details remain unchanged, the update introduces specific refinements that affect both internal evaluations and public reference points.
According to the Mistral AI, Small 3.2 is better to adhere to precise instructions and reduces the probability of infinite or repetitive generations, a problem occasionally in previous versions when manipulating or ambiguous indications.
In the same way, the functions call template has been updated to admit scenarios for the use of more reliable tools, particularly in frames such as VLM.
And at the same time, it could be executed with a single GPU NVIDIA A100/H100 80GB, drastically opening the options for companies with adjusted computation resources and/or budgets.
An updated model after only 3 months
Mistral Small 3.1 was announced in March 2025 as an open flagship launch in the 24b parameter range. He offered complete multimodal capabilities, multilingual understanding and long context processing or up to 128k tokens.
The model was explicitly positioned against patented pairs such as GPT-4o Mini, Claude 3.5 Haiku and Gemma 3-IT, and, according to Mistral, surpassed them in many tasks.
Small 3.1 also emphasized the efficient implementation, with claims of execution inference to 150 tokens per second and support for use on the 3 3 GB device of RAM.
That launch came with base control points and instructs, offering flexibility to adjust through domains such as legal, doctors and technical fields.
In contrast, Small 3.2 focuses on surgical improvements to behavior and reliability. It is not intended to introduce new capacities or changes in architecture. Instead, it acts as a maintenance version: cleaning of edge cases in the generation of output, compliance with adjustment instructions and the indicator interactions of the refining system.
Small 3.2 vs. Little 3.1: What changed?
The instruction after the reference points shows a small but measurable improvement. Mistral’s internal precision increased from 82.75% by a small 3.1 to 84.78% in 3.2 small.

Similarly, the performance in external data sets such as Wildbench V2 and Hard V2 sand improved significantly: Wildbench increased by almost 10 percentage points, while the sand doubled with more than duplicate, jumping from 19.56% to 43.10%.
Internal metrics also suggest a reduced output repetition. The rate of infinite generations fell from 2.11% in a small 3.1 to 1.29% in a small reduction of 3.2, almost 2 ×. This makes the model more reliable for developers who create applications that require constant and limited responses.
The performance through the text and the coding reference points presents a more nuanced image. Small 3.2 showed profits in Humaneval Plus (88.99% to 92.90%), MBPP Pass@5 (74.63% to 78.33%) and simple. It also modestly improved the results of Mmlu Pro and Math.
The vision reference points remain mainly consistent, with slight fluctuations. Chartqa and Docvqa saw marginal profits, while AI2D and Mathvista fell into less than two percentage points. The average vision yield decreased slightly from 81.39% to a small 3.1 to 81.00% in a small 3.2.
This is aligned with the declared intention of Mistral: Small 3.2 is not a review of the model, but a refinement. As such, most reference points are within the expected variety, and some regressions seem to be compensation for specific improvements in other places.
However, as the AI Power and influencer @chatgpt21 user published in X: “worsened in Mmlu”, which means the framework of mass language understanding, a multidisciplinary test with 57 questions through DNS. In fact, Small 3.2 obtained an 80.50%score, slightly below the small 3.1s 80.62%.
The open source license will make it more attractive to cost and custom users
Both 3.1 and 3.2 are available under the Apache 2.0 license and can be accessed through the popular. AI code sharing the repository of clamps (ITELF a startup based in France and New York).
Small 3.2 is compatible with frames such as VLLM and Transformers and requires approximately 55 GB or RAM GPU to execute in accuracy BF16 or FP16.
For developers seeking to build or serve applications, examples of system indications and examples of inference in the models repository are demonstrated.
While Mistral Small 3.1 is already integrated into platforms such as Google Cloud VERTEX AI and is scheduled for the implementation in NVIDIA NIM and Microsoft Azure, Small 3.2 currently seems limited to access to itself through the implementation of the face and direct implementation.
What companies should know when considering Mistral Small 3.2 for their use cases
Mistral Small 3.2 may not change the competitive position in the open weight model space, but rebukes Mistral AI’s commitment to the refinement of the iterative model.
With noticax improvements in reliability and task management, particularly around the accuracy of instruction and tool use, Small 3.2 offers a cleaner user experience for developers and companies that are based on the abuse ecosystem.
The fact that it is done by a French startup and that complies with the EU rules and regulations such as GDPR and EU’s law also makes it attractive to companies that work in that part of the world.
Even so, for those looking for the largest jumps in the reference performance, Small 3.1 remains a reference point, especially since in some cases, such as MMLU, Small 3.2 does not exceed their predecessor. That makes the update more an option focused on stability than a pure update, depending on the use case.