Instructions to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF", filename="Ministral-3-3B-Reasoning-2512-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
Use Docker
docker model run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
- Ollama
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Ollama:
ollama run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
- Unsloth Studio new
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF to start chatting
- Docker Model Runner
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Docker Model Runner:
docker model run hf.co/EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
- Lemonade
How to use EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Ministral-3-3B-Reasoning-2512-GGUF-Q4_K_M
List all available models
lemonade list
- --------------------
- NOTICE:
I noticed after testing (post-upload)
that the template doesn't like to play nice
(doesn't seem to engage the thinking tags correctly)
when pulled from HuggingFace,
I will be correcting this today/tomorrow at the latest! - --------------------
- -------------------------------------------------------------
- GGUF Conversion and Quantization Details: -
-------------------------------------------------------------
------------------------------------------------
- Model Details and Specifications: -
------------------------------------------------
Ministral-3 3B Reasoning 2512 (GGUF)
--------------------
NOTICE:
I noticed after testing (post-upload)
that the template doesn't like to play nice
(doesn't seem to engage the thinking tags correctly)
when pulled from HuggingFace,
I will be correcting this today/tomorrow at the latest!
--------------------
This release contains:
Llama.cpp and Ollama compatible GGUF converted and Quantized model files
(Compatible with both Ollama, and Llama.cpp)
Quantized GGUF version of:
- Ministral-3-3B-Reasoning-2512-BF16
(by MistralAI)
Original Model Link:
Description:
This release includes GGUF (Ollama + Llama.cpp - compatible) model files and two working multi-modal projector(s)
(mmproj) files for the Vision Projector; offering full capabilities in Ollama or Llama.cpp.
What is the "Custom Tokenizer Chat Template?"
As apposed to the standard "Chat Template" made available by MistralAI - this release of GGUF converted and quantized files offer a totally
custom Tokenizer Chat Template in order to provide: Smoother, Faster, Efficient, and Reliable interaction/inference with the model.
This template sheds the "fluff" or non-primary logic from the JINJA Chat/Tokenizer Template - allowing anyone who uses the model
for inferencing the opportunity to enjoy a significant improvement in speed, quality and context adherence without sacrificing any aspect of the initial release by MistralAI.
For reference - here is the new JINJA Tokenizer Chat-Template:
(This template features a sliding context window of FORTY-SIX (46) interactions, which may be adjusted per-individual requirements simply by altering
the fourth (4th) line of this template, upwards from the number forty-seven (47) to either higher or lower numerical values to increase or decrease the sliding context window)
{{- $remMessage := false }}
[SYSTEM_PROMPT]{{- "🟦 Follow instructions that the user provides. Think and respond to the user in the language they use or request. Next sections describes the capabilities that you have. \n\n🟦 [Reasoning Instructions]\nYou have the ability to think before responding to the user. Always start your response by thinking, using an internal monologue. Always use this template when you respond: <think> thoughts and internal monologue </think> then respond directly to the user.\n\n🟦 [Multi-Modal Instructions]\nYou have the ability to read images." }}[/SYSTEM_PROMPT]
{{- range $index, $_ := .Messages }}
{{- if lt (len (slice $.Messages $index)) 47 }}
{{- $remMessage = true }}
{{- end }}
{{- if $remMessage }}{{- if eq .Role "user" }}
[INST]{{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{ .Content }}{{- end }}
{{- end }}
{{- end }}
No modifications, edits, or configurations are required to use this model with
Ollama or llama.cpp, it works natively! Both Vision and Text work with Ollama as well. (^.^)
Coming Soon!!!
Check back occasionally - as a automated installer/configure Python-3 script is making its way to all of my releases!
This allows anyone who is interested in using these models a hassle-free and stress-free experience where the Python-3
script takes care of setting up the model for Ollama (specifically for Ollama, other software optimizations coming later).
It is highly recommended to use the Ollama "create" command along with the supplied ".modelfile" to ensure proper configuration
for anyone who wishes to get the most out of this particular release. Though, the Python-3 automated installer/configuration tool
will handle such aspects if it is chosen to be used.
Happy Inferencing!
-- Jon Z (EnlistedGhost)
Model Updates (As of: Match 26, 2026)
- Updated: Uploaded/Added all GGUF conversion(s) and non-i-matrix Quantized model file(s)
Final Quantized and full-F16 modelfiles are uploaded!!! - Check back for i-Matrix quant model files if you do not see your desired edition (They are being uploaded, thank you for your patience!)
-------------------------------------------------------------
- GGUF Conversion and Quantization Details: -
-------------------------------------------------------------
Software used to convert Safetensors to GGUF:
Software used to create Quantized GGUF Files:
Specific GitHub Commit Point:
Converted to GGUF and Quantized by:
- Downloads last month
- 1,259
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for EnlistedGhost/Ministral-3-3B-Reasoning-2512-GGUF
Base model
mistralai/Ministral-3-3B-Base-2512