Upload README.md with huggingface_hub

41214e1 verified 4 months ago

9.68 kB

	---
	library_name: vllm
	language:
	- en
	- fr
	- es
	- de
	- it
	- pt
	- nl
	- zh
	- ja
	- ko
	- ar
	license: apache-2.0
	inference: false
	base_model:
	- mistralai/Ministral-3-8B-Base-2512
	extra_gated_description: If you want to learn more about how we process your personal
	data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
	tags:
	- mistral-common
	- heretic
	- uncensored
	- decensored
	- abliterated
	---
	# This is a decensored version of [mistralai/Ministral-3-8B-Instruct-2512-BF16](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512-BF16), made using [Heretic](https://github.com/p-e-w/heretic) v1.2.0

	## Abliteration parameters

	\| Parameter \| Value \|
	\| :-------- \| :---: \|
	\| direction_index \| per layer \|
	\| attn.o_proj.max_weight \| 1.97 \|
	\| attn.o_proj.max_weight_position \| 17.48 \|
	\| attn.o_proj.min_weight \| 1.90 \|
	\| attn.o_proj.min_weight_distance \| 10.79 \|
	\| mlp.down_proj.max_weight \| 0.19 \|
	\| mlp.down_proj.max_weight_position \| 8.56 \|
	\| mlp.down_proj.min_weight \| 0.04 \|
	\| mlp.down_proj.min_weight_distance \| 15.62 \|

	## Performance

	\| Metric \| This model \| Original model ([mistralai/Ministral-3-8B-Instruct-2512-BF16](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512-BF16)) \|
	\| :----- \| :--------: \| :---------------------------: \|
	\| KL divergence \| 0.0509 \| 0 (by definition) \|
	\| Refusals \| 8/100 \| 91/100 \|

	-----


	# Ministral 3 8B Instruct 2512 BF16

	A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

	This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.

	The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.

	We provide a no-loss FP8 version [here](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512), you can find other formats and quantizations in the [Ministral 3 - Additional Checkpoints](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints) collection.

	Learn more in our [blog post](https://mistral.ai/news/mistral-3) and [paper](https://arxiv.org/abs/2601.08584).

	## Key Features
	Ministral 3 8B consists of two main architectural components:
	- 8.4B Language Model
	- 0.4B Vision Encoder

	The Ministral 3 8B Instruct model offers the following capabilities:
	- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
	- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
	- System Prompt: Maintains strong adherence and support for system prompts.
	- Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
	- Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
	- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
	- Large Context Window: Supports a 256k context window.

	### Use Cases
	Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.
	- Chat interfaces in constrained environments
	- Local daily-driver AI assistant
	- Image/document description and understanding
	- Translation and content generation
	- Specialized agentic use cases
	- Fine-tuning and specialization
	- And more...

	Bringing advanced AI capabilities to resource-constrained environments.

	## Ministral 3 Family

	\| Model Name \| Type \| Precision \| Link \|
	\|--------------------------------\|--------------------\|-----------\|------------------------------------------------------------------------------------------\|
	\| Ministral 3 3B Base 2512 \| Base pre-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) \|
	\| Ministral 3 3B Instruct 2512 \| Instruct post-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) \|
	\| Ministral 3 3B Reasoning 2512 \| Reasoning capable \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) \|
	\| Ministral 3 8B Base 2512 \| Base pre-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) \|
	\| Ministral 3 8B Instruct 2512 \| Instruct post-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) \|
	\| Ministral 3 8B Reasoning 2512 \| Reasoning capable \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) \|
	\| Ministral 3 14B Base 2512 \| Base pre-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) \|
	\| Ministral 3 14B Instruct 2512 \| Instruct post-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) \|
	\| Ministral 3 14B Reasoning 2512 \| Reasoning capable \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) \|

	Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).

	## Benchmark Results

	We compare Ministral 3 to similar sized models.

	### Reasoning

	\| Model \| AIME25 \| AIME24 \| GPQA Diamond \| LiveCodeBench \|
	\|---------------------------\|-------------\|-------------\|--------------\|---------------\|
	\| Ministral 3 14B \| <u>0.850</u>\| <u>0.898</u>\| <u>0.712</u> \| <u>0.646</u> \|
	\| Qwen3-14B (Thinking) \| 0.737 \| 0.837 \| 0.663 \| 0.593 \|
	\| \| \| \| \| \|
	\| Ministral 3 8B \| 0.787 \| <u>0.860</u>\| 0.668 \| <u>0.616</u> \|
	\| Qwen3-VL-8B-Thinking \| <u>0.798</u>\| <u>0.860</u>\| <u>0.671</u> \| 0.580 \|
	\| \| \| \| \| \|
	\| Ministral 3 3B \| <u>0.721</u>\| <u>0.775</u>\| 0.534 \| <u>0.548</u> \|
	\| Qwen3-VL-4B-Thinking \| 0.697 \| 0.729 \| <u>0.601</u> \| 0.513 \|

	### Instruct

	\| Model \| Arena Hard \| WildBench \| MATH Maj@1 \| MM MTBench \|
	\|---------------------------\|-------------\|------------\|-------------\|------------------\|
	\| Ministral 3 14B \| <u>0.551</u>\| <u>68.5</u>\| <u>0.904</u>\| <u>8.49</u> \|
	\| Qwen3 14B (Non-Thinking) \| 0.427 \| 65.1 \| 0.870 \| NOT MULTIMODAL \|
	\| Gemma3-12B-Instruct \| 0.436 \| 63.2 \| 0.854 \| 6.70 \|
	\| \| \| \| \| \|
	\| Ministral 3 8B \| 0.509 \| <u>66.8</u>\| 0.876 \| <u>8.08</u> \|
	\| Qwen3-VL-8B-Instruct \| <u>0.528</u>\| 66.3 \| <u>0.946</u>\| 8.00 \|
	\| \| \| \| \| \|
	\| Ministral 3 3B \| 0.305 \| <u>56.8</u>\| 0.830 \| 7.83 \|
	\| Qwen3-VL-4B-Instruct \| <u>0.438</u>\| <u>56.8</u>\| <u>0.900</u>\| <u>8.01</u> \|
	\| Qwen3-VL-2B-Instruct \| 0.163 \| 42.2 \| 0.786 \| 6.36 \|
	\| Gemma3-4B-Instruct \| 0.318 \| 49.1 \| 0.759 \| 5.23 \|

	### Base

	\| Model \| Multilingual MMLU \| MATH CoT 2-Shot \| AGIEval 5-shot \| MMLU Redux 5-shot \| MMLU 5-shot \| TriviaQA 5-shot \|
	\|---------------------\|-------------------\|-----------------\|----------------\|-------------------\|-------------\|-----------------\|
	\| Ministral 3 14B \| 0.742 \| <u>0.676</u> \| 0.648 \| 0.820 \| 0.794 \| 0.749 \|
	\| Qwen3 14B Base \| <u>0.754</u> \| 0.620 \| <u>0.661</u> \| <u>0.837</u> \| <u>0.804</u>\| 0.703 \|
	\| Gemma 3 12B Base \| 0.690 \| 0.487 \| 0.587 \| 0.766 \| 0.745 \| <u>0.788</u> \|
	\| \| \| \| \| \| \| \|
	\| Ministral 3 8B \| <u>0.706</u> \| <u>0.626</u> \| 0.591 \| 0.793 \| <u>0.761</u>\| <u>0.681</u> \|
	\| Qwen 3 8B Base \| 0.700 \| 0.576 \| <u>0.596</u> \| <u>0.794</u> \| 0.760 \| 0.639 \|
	\| \| \| \| \| \| \| \|
	\| Ministral 3 3B \| 0.652 \| <u>0.601</u> \| 0.511 \| 0.735 \| 0.707 \| 0.592 \|
	\| Qwen 3 4B Base \| <u>0.677</u> \| 0.405 \| <u>0.570</u> \| <u>0.759</u> \| <u>0.713</u>\| 0.530 \|
	\| Gemma 3 4B Base \| 0.516 \| 0.294 \| 0.430 \| 0.626 \| 0.589 \| <u>0.640</u> \|

	## License

	This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).

	You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

	---
	library_name: vllm
	language:
	- en
	- fr
	- es
	- de
	- it
	- pt
	- nl
	- zh
	- ja
	- ko
	- ar
	license: apache-2.0
	inference: false
	base_model:
	- mistralai/Ministral-3-8B-Base-2512
	extra_gated_description: If you want to learn more about how we process your personal
	data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
	tags:
	- mistral-common
	- heretic
	- uncensored
	- decensored
	- abliterated
	---
	# This is a decensored version of [mistralai/Ministral-3-8B-Instruct-2512-BF16](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512-BF16), made using [Heretic](https://github.com/p-e-w/heretic) v1.2.0

	## Abliteration parameters

	\| Parameter \| Value \|
	\| :-------- \| :---: \|
	\| direction_index \| per layer \|
	\| attn.o_proj.max_weight \| 1.97 \|
	\| attn.o_proj.max_weight_position \| 17.48 \|
	\| attn.o_proj.min_weight \| 1.90 \|
	\| attn.o_proj.min_weight_distance \| 10.79 \|
	\| mlp.down_proj.max_weight \| 0.19 \|
	\| mlp.down_proj.max_weight_position \| 8.56 \|
	\| mlp.down_proj.min_weight \| 0.04 \|
	\| mlp.down_proj.min_weight_distance \| 15.62 \|

	## Performance

	\| Metric \| This model \| Original model ([mistralai/Ministral-3-8B-Instruct-2512-BF16](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512-BF16)) \|
	\| :----- \| :--------: \| :---------------------------: \|
	\| KL divergence \| 0.0509 \| 0 (by definition) \|
	\| Refusals \| 8/100 \| 91/100 \|

	-----


	# Ministral 3 8B Instruct 2512 BF16

	A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

	This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.

	The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.

	We provide a no-loss FP8 version [here](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512), you can find other formats and quantizations in the [Ministral 3 - Additional Checkpoints](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints) collection.

	Learn more in our [blog post](https://mistral.ai/news/mistral-3) and [paper](https://arxiv.org/abs/2601.08584).

	## Key Features
	Ministral 3 8B consists of two main architectural components:
	- 8.4B Language Model
	- 0.4B Vision Encoder

	The Ministral 3 8B Instruct model offers the following capabilities:
	- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
	- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
	- System Prompt: Maintains strong adherence and support for system prompts.
	- Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
	- Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
	- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
	- Large Context Window: Supports a 256k context window.

	### Use Cases
	Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.
	- Chat interfaces in constrained environments
	- Local daily-driver AI assistant
	- Image/document description and understanding
	- Translation and content generation
	- Specialized agentic use cases
	- Fine-tuning and specialization
	- And more...

	Bringing advanced AI capabilities to resource-constrained environments.

	## Ministral 3 Family

	\| Model Name \| Type \| Precision \| Link \|
	\|--------------------------------\|--------------------\|-----------\|------------------------------------------------------------------------------------------\|
	\| Ministral 3 3B Base 2512 \| Base pre-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) \|
	\| Ministral 3 3B Instruct 2512 \| Instruct post-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) \|
	\| Ministral 3 3B Reasoning 2512 \| Reasoning capable \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) \|
	\| Ministral 3 8B Base 2512 \| Base pre-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) \|
	\| Ministral 3 8B Instruct 2512 \| Instruct post-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) \|
	\| Ministral 3 8B Reasoning 2512 \| Reasoning capable \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) \|
	\| Ministral 3 14B Base 2512 \| Base pre-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) \|
	\| Ministral 3 14B Instruct 2512 \| Instruct post-trained \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) \|
	\| Ministral 3 14B Reasoning 2512 \| Reasoning capable \| BF16 \| [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) \|

	Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).

	## Benchmark Results

	We compare Ministral 3 to similar sized models.

	### Reasoning

	\| Model \| AIME25 \| AIME24 \| GPQA Diamond \| LiveCodeBench \|
	\|---------------------------\|-------------\|-------------\|--------------\|---------------\|
	\| Ministral 3 14B \| <u>0.850</u>\| <u>0.898</u>\| <u>0.712</u> \| <u>0.646</u> \|
	\| Qwen3-14B (Thinking) \| 0.737 \| 0.837 \| 0.663 \| 0.593 \|
	\| \| \| \| \| \|
	\| Ministral 3 8B \| 0.787 \| <u>0.860</u>\| 0.668 \| <u>0.616</u> \|
	\| Qwen3-VL-8B-Thinking \| <u>0.798</u>\| <u>0.860</u>\| <u>0.671</u> \| 0.580 \|
	\| \| \| \| \| \|
	\| Ministral 3 3B \| <u>0.721</u>\| <u>0.775</u>\| 0.534 \| <u>0.548</u> \|
	\| Qwen3-VL-4B-Thinking \| 0.697 \| 0.729 \| <u>0.601</u> \| 0.513 \|

	### Instruct

	\| Model \| Arena Hard \| WildBench \| MATH Maj@1 \| MM MTBench \|
	\|---------------------------\|-------------\|------------\|-------------\|------------------\|
	\| Ministral 3 14B \| <u>0.551</u>\| <u>68.5</u>\| <u>0.904</u>\| <u>8.49</u> \|
	\| Qwen3 14B (Non-Thinking) \| 0.427 \| 65.1 \| 0.870 \| NOT MULTIMODAL \|
	\| Gemma3-12B-Instruct \| 0.436 \| 63.2 \| 0.854 \| 6.70 \|
	\| \| \| \| \| \|
	\| Ministral 3 8B \| 0.509 \| <u>66.8</u>\| 0.876 \| <u>8.08</u> \|
	\| Qwen3-VL-8B-Instruct \| <u>0.528</u>\| 66.3 \| <u>0.946</u>\| 8.00 \|
	\| \| \| \| \| \|
	\| Ministral 3 3B \| 0.305 \| <u>56.8</u>\| 0.830 \| 7.83 \|
	\| Qwen3-VL-4B-Instruct \| <u>0.438</u>\| <u>56.8</u>\| <u>0.900</u>\| <u>8.01</u> \|
	\| Qwen3-VL-2B-Instruct \| 0.163 \| 42.2 \| 0.786 \| 6.36 \|
	\| Gemma3-4B-Instruct \| 0.318 \| 49.1 \| 0.759 \| 5.23 \|

	### Base

	\| Model \| Multilingual MMLU \| MATH CoT 2-Shot \| AGIEval 5-shot \| MMLU Redux 5-shot \| MMLU 5-shot \| TriviaQA 5-shot \|
	\|---------------------\|-------------------\|-----------------\|----------------\|-------------------\|-------------\|-----------------\|
	\| Ministral 3 14B \| 0.742 \| <u>0.676</u> \| 0.648 \| 0.820 \| 0.794 \| 0.749 \|
	\| Qwen3 14B Base \| <u>0.754</u> \| 0.620 \| <u>0.661</u> \| <u>0.837</u> \| <u>0.804</u>\| 0.703 \|
	\| Gemma 3 12B Base \| 0.690 \| 0.487 \| 0.587 \| 0.766 \| 0.745 \| <u>0.788</u> \|
	\| \| \| \| \| \| \| \|
	\| Ministral 3 8B \| <u>0.706</u> \| <u>0.626</u> \| 0.591 \| 0.793 \| <u>0.761</u>\| <u>0.681</u> \|
	\| Qwen 3 8B Base \| 0.700 \| 0.576 \| <u>0.596</u> \| <u>0.794</u> \| 0.760 \| 0.639 \|
	\| \| \| \| \| \| \| \|
	\| Ministral 3 3B \| 0.652 \| <u>0.601</u> \| 0.511 \| 0.735 \| 0.707 \| 0.592 \|
	\| Qwen 3 4B Base \| <u>0.677</u> \| 0.405 \| <u>0.570</u> \| <u>0.759</u> \| <u>0.713</u>\| 0.530 \|
	\| Gemma 3 4B Base \| 0.516 \| 0.294 \| 0.430 \| 0.626 \| 0.589 \| <u>0.640</u> \|

	## License

	This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).

	You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.