--- license: apache-2.0 library_name: mlx tags: - mlx - krill - qwen3_5 - qwythos - vision-language - nvfp4 - long-context base_model: empero-ai/Qwythos-9B-Claude-Mythos-5-1M pipeline_tag: image-text-to-text --- # Qwythos-9B-Claude-Mythos-5-1M โ€” MLX nvfp4 (complete VLM, Krill-native) A mixed-precision **nvfp4** (group 16) quantization of **[empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)**, a Qwen3.5-class hybrid vision-language model. > Original model and weights by **[empero-ai](https://huggingface.co/empero-ai)** ([Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)). Full credit to them; this repo only re-quantizes their model. ## Why this build - ๐Ÿ‘๏ธ **Complete vision-language model โ€” the vision tower is included.** This build keeps the full VLM (text decoder + vision tower), not a text-only strip. - ๐ŸŽฏ **nvfp4 mixed precision.** The decoder is nvfp4 at group size 16, with `down_proj` and `o_proj` **protected at 8-bit** and the **vision tower kept at higher precision**. Smaller and faster than int4 at comparable quality. - โšก **Native Krill runtime.** Runs as a **native Swift + MLX** model on Apple Silicon, on Krill's from-scratch runtime for the Qwen3.5 hybrid GatedDeltaNet (SSM) + full-attention decoder โ€” not an mlx_vlm passthrough. - ๐Ÿงต **Long context.** 262K native (1M via YaRN rope-scaling upstream). ## Run in Krill (recommended) ```bash # install Krill brew tap srvsngh99/krill && brew install krill # or: curl -fsSL https://raw.githubusercontent.com/srvsngh99/Krill/main/install.sh | sh # run Qwythos nvfp4 (pulls this repo) krill run qwythos-9b-nvfp4 "Give three tips for staying focused while studying." krill update ``` ## Run with mlx_vlm (text + vision) ```bash pip install -U mlx-vlm python -m mlx_vlm generate --model srv-sngh/Qwythos-9B-Claude-Mythos-5-1M-mlx-nvfp4 \ --prompt "Describe this image." --image path/to/image.jpg --max-tokens 200 ``` ## About the base model A **Qwen3.5-class hybrid** VLM: the text decoder interleaves **GatedDeltaNet linear-attention (SSM) layers** with full softmax-attention every fourth layer, plus a vision tower. Full credit to the original creators, **[empero-ai](https://huggingface.co/empero-ai)**. ## Quantization | field | value | |------|-------| | format | MLX nvfp4 (mixed precision) | | group size | 16 | | protected | `down_proj`, `o_proj` @ 8-bit affine; vision tower at higher precision | | size | ~6.4 GB | | contents | complete VLM (text decoder + vision tower) | In Krill, the **text decoder runs natively**; the **vision tower currently runs via mlx_vlm** (native vision is a follow-up). ## License apache-2.0, matching the base model [empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M).