Qwythos-9B-Claude-Mythos-5-1M-mxfp8-mlx

MLX quantization of empero-ai/Qwythos-9B-Claude-Mythos-5-1M for Apple Silicon.

Note — text tower only. The source model is a Qwen3.5-VL multimodal model (Qwen3_5ForConditionalGeneration, with a vision encoder). This MLX conversion contains only the text/language tower — the vision encoder weights are not included, so this is a text-only model and does not accept image or video input. The text reasoning the original is benchmarked for (GSM8K, MMLU) is unaffected.

Variant: Block float MX FP8
Disk size: 8826 MB
Quantized by: sahilchachra

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Decode tok/s (avg, long traces)	30.67	N/A
Peak memory (GB)	9.599	N/A
Disk size (MB)	8826	17969

Quality

Benchmark	This model	FP16 baseline	n
GSM8K (math, accuracy)	100.0%	N/A	50
MMLU (knowledge, accuracy)	80.0%	N/A	50

Context scaling (decode tok/s)

Context length	Decode tok/s
~128 tokens	33.7
~256 tokens	33.6
~512 tokens	33.6
~1024 tokens	33.5

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-mxfp8-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)

All variants in this collection

Model	Variant
sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-mxfp4-mlx	Block float MX FP4
sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-mxfp8-mlx	Block float MX FP8 ← this model
sahilchachra/Qwythos-9B-Claude-Mythos-5-1M-optiq-5bpw-mlx	OptiQ mixed-precision (target 5.0 bpw)

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M5 Pro, 24 GB unified memory
License: see empero-ai/Qwythos-9B-Claude-Mythos-5-1M for the original model's license