Qwythos-9B-Claude-Mythos-5-1M-mxfp4-mlx

MLX quantization of empero-ai/Qwythos-9B-Claude-Mythos-5-1M for Apple Silicon.

Note — text tower only. The source model is a Qwen3.5-VL multimodal model (Qwen3_5ForConditionalGeneration, with a vision encoder). This MLX conversion contains only the text/language tower — the vision encoder weights are not included, so this is a text-only model and does not accept image or video input. The text reasoning the original is benchmarked for (GSM8K, MMLU) is unaffected.

Variant: Block float MX FP4
Disk size: 4557 MB
Quantized by: sahilchachra

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Decode tok/s (avg, long traces)	60.03	N/A
Peak memory (GB)	5.245	N/A
Disk size (MB)	4557	17969

Quality

Benchmark	This model	FP16 baseline	n
GSM8K (math, accuracy)	92.0%	N/A	50
MMLU (knowledge, accuracy)	74.0%	N/A	50

Context scaling (decode tok/s)

Context length	Decode tok/s
~128 tokens	60.9
~256 tokens	60.6
~512 tokens	60.4
~1024 tokens	60.6

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-mxfp4-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)

All variants in this collection

Model	Variant
sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-mxfp4-mlx	Block float MX FP4 ← this model
sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-mxfp8-mlx	Block float MX FP8
sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-optiq-5bpw-mlx	OptiQ mixed-precision (target 5.0 bpw)

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M5 Pro, 24 GB unified memory
License: see empero-ai/Qwythos-9B-Claude-Mythos-5-1M for the original model's license