This is a Ministral-3-3B-Instruct-2512 fine-tune, produced at the request of redaihf through P-E-W's Heretic (v1.3.0) abliteration engine with Arbitrary-Rank Ablation enabled.
Heretication Results
| Score Metric | Value | Parameter | Value |
|---|---|---|---|
| Refusals | 2/416 | start_layer_index | 10 |
| KL Divergence | 0.0216 | end_layer_index | 25 |
| Initial Refusals | 401/416 | preserve_good_behavior_weight | 0.9095 |
| steer_bad_behavior_weight | 0.0001 | ||
| overcorrect_relative_weight | 1.0111 | ||
| neighbor_count | 8 |
Appendix
Empty system prompt.
Heretication Rituals
[Trial 140] Refusals: 0/416, KL divergence: 3.5422
[Trial 257] Refusals: 1/416, KL divergence: 0.0255
» [Trial 233] Refusals: 2/416, KL divergence: 0.0216
[Trial 149] Refusals: 10/416, KL divergence: 0.0190
[Trial 187] Refusals: 16/416, KL divergence: 0.0172
[Trial 90] Refusals: 21/416, KL divergence: 0.0144
[Trial 3] Refusals: 45/416, KL divergence: 0.0136
[Trial 295] Refusals: 98/416, KL divergence: 0.0090
[Trial 292] Refusals: 107/416, KL divergence: 0.0075
[Trial 202] Refusals: 165/416, KL divergence: 0.0054
[Trial 48] Refusals: 270/416, KL divergence: 0.0048
[Trial 274] Refusals: 284/416, KL divergence: 0.0048
[Trial 201] Refusals: 309/416, KL divergence: 0.0024
[Trial 221] Refusals: 380/416, KL divergence: 0.0014
[Trial 168] Refusals: 401/416, KL divergence: 0.0000
PIQA Benchmarks
PIQA benchmarks are considered in final trial selection for release.
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric ┃ Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA Base │ acc,none │ 0.7720 │
│ │ acc_stderr,none │ 0.0098 │
│ │ acc_norm,none │ 0.7753 │
│ │ acc_norm_stderr,none │ 0.0097 │
└───────────┴──────────────────────┴────────┘
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric ┃ Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA T233 │ acc,none │ 0.7758 │
│ │ acc_stderr,none │ 0.0097 │
│ │ acc_norm,none │ 0.7829 │
│ │ acc_norm_stderr,none │ 0.0096 │
└───────────┴──────────────────────┴────────┘
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric ┃ Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA T257 │ acc,none │ 0.7742 │
│ │ acc_stderr,none │ 0.0098 │
│ │ acc_norm,none │ 0.7748 │
│ │ acc_norm_stderr,none │ 0.0097 │
└───────────┴──────────────────────┴────────┘
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Metric ┃ Value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ PIQA T149 │ acc,none │ 0.7715 │
│ │ acc_stderr,none │ 0.0098 │
│ │ acc_norm,none │ 0.7791 │
│ │ acc_norm_stderr,none │ 0.0097 │
└───────────┴──────────────────────┴────────┘
Ministral 3 3B Instruct 2512 BF16
The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.
This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat and instruction based use cases.
The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, capable of fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized.
We provide a no-loss FP8 version here, you can find other formats and quantizations in the Ministral 3 - Additional Checkpoints collection.
Learn more in our blog post and paper.
Key Features
Ministral 3 3B consists of two main architectural components:
- 3.4B Language Model
- 0.4B Vision Encoder
The Ministral 3 3B Instruct model offers the following capabilities:
- Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
- Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
- System Prompt: Maintains strong adherence and support for system prompts.
- Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
- Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
- Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
- Large Context Window: Supports a 256k context window.
Use Cases
Ideal for lightweight, real-time applications on edge or low-resource devices, such as:
- Image captioning
- Text classification
- Real-time efficient translation
- Data extraction
- Short content generation
- Fine-tuning and specialization
- And more...
Bringing advanced AI capabilities to edge and distributed environments for embedded systems.
Ministral 3 Family
| Model Name | Type | Precision | Link |
|---|---|---|---|
| Ministral 3 3B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 3B Instruct 2512 | Instruct post-trained | BF16 | Hugging Face |
| Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | Hugging Face |
| Ministral 3 8B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 8B Instruct 2512 | Instruct post-trained | BF16 | Hugging Face |
| Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | Hugging Face |
| Ministral 3 14B Base 2512 | Base pre-trained | BF16 | Hugging Face |
| Ministral 3 14B Instruct 2512 | Instruct post-trained | BF16 | Hugging Face |
| Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | Hugging Face |
Other formats available here.
Benchmark Results
We compare Ministral 3 to similar sized models.
Reasoning
| Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
|---|---|---|---|---|
| Ministral 3 14B | 0.850 | 0.898 | 0.712 | 0.646 |
| Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
| Ministral 3 8B | 0.787 | 0.860 | 0.668 | 0.616 |
| Qwen3-VL-8B-Thinking | 0.798 | 0.860 | 0.671 | 0.580 |
| Ministral 3 3B | 0.721 | 0.775 | 0.534 | 0.548 |
| Qwen3-VL-4B-Thinking | 0.697 | 0.729 | 0.601 | 0.513 |
Instruct
| Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
|---|---|---|---|---|
| Ministral 3 14B | 0.551 | 68.5 | 0.904 | 8.49 |
| Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
| Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
| Ministral 3 8B | 0.509 | 66.8 | 0.876 | 8.08 |
| Qwen3-VL-8B-Instruct | 0.528 | 66.3 | 0.946 | 8.00 |
| Ministral 3 3B | 0.305 | 56.8 | 0.830 | 7.83 |
| Qwen3-VL-4B-Instruct | 0.438 | 56.8 | 0.900 | 8.01 |
| Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |
| Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |
Base
| Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
|---|---|---|---|---|---|---|
| Ministral 3 14B | 0.742 | 0.676 | 0.648 | 0.820 | 0.794 | 0.749 |
| Qwen3 14B Base | 0.754 | 0.620 | 0.661 | 0.837 | 0.804 | 0.703 |
| Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | 0.788 |
| Ministral 3 8B | 0.706 | 0.626 | 0.591 | 0.793 | 0.761 | 0.681 |
| Qwen 3 8B Base | 0.700 | 0.576 | 0.596 | 0.794 | 0.760 | 0.639 |
| Ministral 3 3B | 0.652 | 0.601 | 0.511 | 0.735 | 0.707 | 0.592 |
| Qwen 3 4B Base | 0.677 | 0.405 | 0.570 | 0.759 | 0.713 | 0.530 |
| Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | 0.640 |
License
This model is licensed under the Apache 2.0 License.
You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.
- Downloads last month
- 105
Model tree for MuXodious/Ministral-3-3B-Instruct-2512-ARA-heresy
Base model
mistralai/Ministral-3-3B-Base-2512