Instructions to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF",
	filename="Ministral-3-3B-Reasoning-2512-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

Use Docker

docker model run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

Ollama
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Ollama:
```
ollama run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
```

Unsloth Studio

How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Docker Model Runner:
```
docker model run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
```

Lemonade

How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Ministral-3-3B-Reasoning-2512-GGUF-Q4_K_M

List all available models

lemonade list

------------------------------------------------
- Model Details and Specifications: -
------------------------------------------------
Ministral-3 3B Reasoning 2512 (GGUF)

--------------------

NOTICE:

I noticed after testing (post-upload)
that the template doesn't like to play nice
(doesn't seem to engage the thinking tags correctly)
when pulled from HuggingFace,
I will be correcting this today/tomorrow at the latest!

--------------------

This release contains:
Llama.cpp and Ollama compatible GGUF converted and Quantized model files (Compatible with both Ollama, and Llama.cpp)

Quantized GGUF version of:

Ministral-3-3B-Reasoning-2512-BF16
(by MistralAI)

Original Model Link:

mistralai/Ministral-3-3B-Reasoning-2512

Description:
This release includes GGUF (Ollama + Llama.cpp - compatible) model files and two working multi-modal projector(s) (mmproj) files for the Vision Projector; offering full capabilities in Ollama or Llama.cpp.

What is the "Custom Tokenizer Chat Template?"
As apposed to the standard "Chat Template" made available by MistralAI - this release of GGUF converted and quantized files offer a totally custom Tokenizer Chat Template in order to provide: Smoother, Faster, Efficient, and Reliable interaction/inference with the model. This template sheds the "fluff" or non-primary logic from the JINJA Chat/Tokenizer Template - allowing anyone who uses the model for inferencing the opportunity to enjoy a significant improvement in speed, quality and context adherence without sacrificing any aspect of the initial release by MistralAI.

For reference - here is the new JINJA Tokenizer Chat-Template:
(This template features a sliding context window of FORTY-SIX (46) interactions, which may be adjusted per-individual requirements simply by altering the fourth (4th) line of this template, upwards from the number forty-seven (47) to either higher or lower numerical values to increase or decrease the sliding context window)

{{- $remMessage := false }}
[SYSTEM_PROMPT]{{- "🟦 Follow instructions that the user provides. Think and respond to the user in the language they use or request. Next sections describes the capabilities that you have. \n\n🟦 [Reasoning Instructions]\nYou have the ability to think before responding to the user. Always start your response by thinking, using an internal monologue. Always use this template when you respond: <think> thoughts and internal monologue </think> then respond directly to the user.\n\n🟦 [Multi-Modal Instructions]\nYou have the ability to read images." }}[/SYSTEM_PROMPT]
{{- range $index, $_ := .Messages }}
{{- if lt (len (slice $.Messages $index)) 47 }}
{{- $remMessage = true }}
{{- end }}
{{- if $remMessage }}{{- if eq .Role "user" }}
[INST]{{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{ .Content }}{{- end }}
{{- end }}
{{- end }}

No modifications, edits, or configurations are required to use this model with Ollama or llama.cpp, it works natively! Both Vision and Text work with Ollama as well. (^.^)

Coming Soon!!!
Check back occasionally - as a automated installer/configure Python-3 script is making its way to all of my releases! This allows anyone who is interested in using these models a hassle-free and stress-free experience where the Python-3 script takes care of setting up the model for Ollama (specifically for Ollama, other software optimizations coming later). It is highly recommended to use the Ollama "create" command along with the supplied ".modelfile" to ensure proper configuration for anyone who wishes to get the most out of this particular release. Though, the Python-3 automated installer/configuration tool will handle such aspects if it is chosen to be used.

Happy Inferencing!
-- Jon Z (EnlistedGhost)

Model Updates (As of: Match 26, 2026)

Updated: Uploaded/Added all GGUF conversion(s) and non-i-matrix Quantized model file(s)
Final Quantized and full-F16 modelfiles are uploaded!!! - Check back for i-Matrix quant model files if you do not see your desired edition (They are being uploaded, thank you for your patience!)

-------------------------------------------------------------
- GGUF Conversion and Quantization Details: -
-------------------------------------------------------------

Software used to convert Safetensors to GGUF:

llama.cpp | Version: 8189

Software used to create Quantized GGUF Files:

llama.cpp | Version: 8189

Specific GitHub Commit Point:

b8189

Converted to GGUF and Quantized by:

EnlistedGhost

Downloads last month: 245

GGUF

Model size

3B params

Architecture

mistral3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Ministral-3-3B-Reasoning-2512

Quantized

(29)

this model

EnlistedGhost
/

Ministral-3-3B-Reasoning-2512-GGUF

--------------------

NOTICE:

I noticed after testing (post-upload)
that the template doesn't like to play nice
(doesn't seem to engage the thinking tags correctly)
when pulled from HuggingFace,
I will be correcting this today/tomorrow at the latest!

--------------------

Model Updates (As of: Match 26, 2026)

-------------------------------------------------------------
- GGUF Conversion and Quantization Details: -
-------------------------------------------------------------

Model tree for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF

Dataset used to train EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF

--------------------

NOTICE: I noticed after testing (post-upload) that the template doesn't like to play nice (doesn't seem to engage the thinking tags correctly) when pulled from HuggingFace, I will be correcting this today/tomorrow at the latest!

--------------------

Model Updates (As of: Match 26, 2026)

------------------------------------------------------------- - GGUF Conversion and Quantization Details: --------------------------------------------------------------

Model tree for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF

Dataset used to train EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF

NOTICE:

I noticed after testing (post-upload)
that the template doesn't like to play nice
(doesn't seem to engage the thinking tags correctly)
when pulled from HuggingFace,
I will be correcting this today/tomorrow at the latest!

-------------------------------------------------------------
- GGUF Conversion and Quantization Details: -
-------------------------------------------------------------