How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tarruda/DeepSeek-V4-Flash-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for tarruda/DeepSeek-V4-Flash-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for tarruda/DeepSeek-V4-Flash-GGUF to start chatting
Quick Links

DeepSeek V4 Flash GGUF

GGUF quantizations for deepseek-ai/DeepSeek-V4-Flash

DeepSeek published the original model weights in MXFP4, so the MXFP4 GGUFs in this repo are direct conversions of those original safetensors.

Quant Recipes

Recipe Quant Size Default type Tensor-specific overrides
Q3_K 119297.23 MiB (3.52 BPW) Q6_K ffn_down_exps=q3_k, ffn_gate_exps=q3_k, ffn_up_exps=q3_k
IQ3_XXS 106912.23 MiB (3.15 BPW) Q6_K ffn_down_exps=iq3_xxs, ffn_gate_exps=iq3_xxs, ffn_up_exps=iq3_xxs
Q2_K 92465.23 MiB (2.73 BPW) Q6_K ffn_down_exps=q2_k, ffn_gate_exps=q2_k, ffn_up_exps=q2_k

Usage

This is the script I use to run:

#!/bin/sh -e

model="./IQ3_XXS/DeepSeek-V4-Flash-IQ3_XXS-00001-of-00004.gguf"

ctx=131072
parallel=1

ctx_size=$((ctx * parallel))

llama-server --no-mmap --no-warmup \
  --model $model --ctx-size $ctx_size -np $parallel \
  --temp 1.0 --top-p 1.0
Downloads last month
-
GGUF
Model size
284B params
Architecture
deepseek4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tarruda/DeepSeek-V4-Flash-GGUF

Quantized
(84)
this model