Mistral Unveils Three Compact Models

Mistral's model releases are as prolific as a litter of puppies, with three new models launched in a week, all of modest size but leading in their class.

Mistral NeMo A 12B LLM with a context length of 128K. The model excels in reasoning, world knowledge, and coding accuracy.

During training, Mistral NeMo incorporates quantisation, enabling lossless inference at FP8.

Multilingually trained, it is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

Mistral NeMo utilizes a new tokenizer based on Tiktoken called Tekken, trained on over 100 languages. It compresses natural language text and source code more efficiently than the SentencePiece tokenizer used in previous Mistral models.

Advanced fine-tuning and alignment have been applied to Mistral NeMo. It significantly outperforms the Mistral 7B in following precise instructions, reasoning, handling multi-turn conversations, and generating code.

*MathΣtral Mathematical Model A 7B model designed specifically for mathematical reasoning and scientific discovery. With a 32k context window and an Apache 2.0 open-source license. The model achieves 56.6% on MATH and 63.47% on MMLU. Most importantly, its reasoning capabilities allow it to achieve significantly better results with more inference time. Coincidentally, the launch day marks the 2311st anniversary of Archimedes' birth.

*Codestral Mamba Code Model Unlike the Transformer architecture, the Mamba model offers the advantage of linear time inference and theoretically models infinite-length sequences. It allows for extensive interaction with the model, providing rapid responses regardless of input length. These features make it particularly suited for code generation. Codestral Mamba supports up to 256K context.

Mistral-inference SDK can be used to deploy Codestral Mamba, which relies on the reference implementation from the Mamba GitHub repository. The model can also be deployed via TensorRT-LLM.

Please note that the translation provided is a direct translation of the text you provided. The names of the models and any specific terms should be kept as they are unless they are meant to be translated as well.