How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Melvin56/Qwen3-4B-ik_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Melvin56/Qwen3-4B-ik_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Melvin56/Qwen3-4B-ik_GGUF to start chatting
Quick Links

Melvin56/Qwen3-4B-ik_GGUF

Quant for ik_llama.cpp

Build: 3680 (a2d24c97)

Original Model : Qwen/Qwen3-4B

I used imatrix to create all these quants using this Dataset.

Perplexity
Perplexity measurement methodology.

I tested all quants using ik_llama.cpp build 3680 (a2d24c97)

ik_llama.cpp/build/bin/llama-perplexity \
-m .gguf \
--ctx-size 512 \
--ubatch-size 512 \
-f wikitext-2-raw/wiki.test.raw \
-fa \
-ngl 999

Raw data

Quant Size (GB) PPL
BF16 8.05 14.3308 +/- 0.13259
IQ6_K 3.34 14.2810 +/- 0.13159
IQ5_K 2.82 14.5004 +/- 0.13465
IQ4_K 2.38 14.5280 +/- 0.13414
IQ4_KS 2.22 15.2121 +/- 0.14294

CPU (AVX2) CPU (ARM NEON) Metal cuBLAS rocBLAS SYCL CLBlast Vulkan Kompute
K-quants ✅ 🐢5 ✅ 🐢5
I-quants ✅ 🐢4 ✅ 🐢4 ✅ 🐢4 Partial¹
✅: feature works
🚫: feature does not work
❓: unknown, please contribute if you can test it youself
🐢: feature is slow
¹: IQ3_S and IQ1_S, see #5886
²: Only with -ngl 0
³: Inference is 50% slower
⁴: Slower than K-quants of comparable size
⁵: Slower than cuBLAS/rocBLAS on similar cards
⁶: Only q8_0 and iq4_nl
Downloads last month
42
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Melvin56/Qwen3-4B-ik_GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(238)
this model

Collection including Melvin56/Qwen3-4B-ik_GGUF