GGUF
How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ikawrakow/mixtral-8x7b-quantized-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ikawrakow/mixtral-8x7b-quantized-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ikawrakow/mixtral-8x7b-quantized-gguf to start chatting
Quick Links

This repository contains improved Mixtral-8x7B quantized models in GGUF format for use with llama.cpp. The models are fully compatible with the oficial llama.cpp release and can be used out-of-the-box.

The table shows a comparison between these models and the current llama.cpp quantization approach using Wikitext perplexities for a context length of 512 tokens. The "Quantization Error" columns in the table are defined as (PPL(quantized model) - PPL(int8))/PPL(int8). Running the full fp16 Mixtral8x7b model on the systems I have available takes too long, so I'm comparing against the 8-bit quantized model, where I get PPL = 4.1049. From past experience the 8-bit quantization should be basically equivalent to fp16.

Quantization Model file PPL(llama.cpp) Quantization Error PPL(new quants) Quantization Error
Q2_K mixtral-8x7b-q2k.gguf 7.4660 81.9% 5.0576 23.2%
Q3_K_S mixtral-8x7b-q3k-small.gguf 4.4601 8.65% 4.3848 6.82%
Q3_K_M mixtral-8x7b-q3k-medium.gguf 4.4194 7.66% 4.2884 4.47%
Q4_K_S mixtral-8x7b-q4k-small.gguf 4.2523 3.59% 4.1764 1.74%
Q4_K_M mistral-8x7b-q4k-medium.gguf 4.2523 3.59% 4.1652 1.47%
Q5_K_S mixtral-7b-q5k-small.gguf 4.1395 0.84% 4.1278 0.56%
Q4_0 mixtral-8x7b-q40.gguf 4.2232 2.88% 4.2001 2.32%
Q4_1 mistral-8x7b-q41.gguf 4.2547 3.65% 4.1713 1.62%
Q5_0 mistral-8x7b-q50.gguf 4.1426 0.92% 4.1335 0.70%
Downloads last month
175
GGUF
Model size
47B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support