Qwen3-Coder-30B-A3B-Instruct-GGUF

MXFP4_MOE Quantization

This repository contains MXFP4_MOE quantized GGUF of Qwen3-Coder-30B-A3B-Instruct.

Model Details

Property Value
Base Model Qwen3-Coder-30B-A3B-Instruct
Architecture Qwen3MoE
Parameters 30B (3.6B active, 128 experts, 8 activated)
Quantization MXFP4_MOE (OCP MXFP4 E2M1, block 32, shared 8-bit block exponents)
BPW 4.47
File Size ~17.1 GB
Context Length 32K

Download

huggingface-cli download FreedomAISVR/Qwen3-Coder-30B-A3B-MXFP4-MOE-GGUF qwen3-coder-30b-a3b-mxfp4_moe.gguf --local-dir . --local-dir-use-symlinks False

Quantization Information

This model uses MXFP4 (Microscaling FP4) quantization via llama.cpp's MXFP4_MOE type:

  • E2M1 format: 1 sign bit, 2 exponent bits, 1 mantissa bit
  • Block size: 32 elements sharing an 8-bit block exponent
  • Expert weights: Quantized to MXFP4 (3 ffn_exps tensors per layer)
  • Attention weights: Quantized to Q8_0 (8-bit block quantization)
  • Other weights: Kept in F32/F16

Verification

After download, verify the file:

echo "9f5a07e402df2aa16b9b4fcee22b5132 *qwen3-coder-30b-a3b-mxfp4_moe.gguf" | md5sum -c

Credits

  • Qwen Team for the base model
  • llama.cpp for the GGUF format and quantization tools
  • MXFP4 is an OCP standard microscaling FP4 format backed by AMD, NVIDIA, Microsoft, Meta, and OpenAI
Downloads last month
264
GGUF
Model size
31B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FreedomAISVR/Qwen3-Coder-30B-A3B-MXFP4-MOE-GGUF

Quantized
(145)
this model

Collection including FreedomAISVR/Qwen3-Coder-30B-A3B-MXFP4-MOE-GGUF