--- library_name: mlx license: apache-2.0 license_link: https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER/blob/main/LICENSE pipeline_tag: text-generation language: - en tags: - mlx - lightning-mlx - mtplx - qwen3.5 - qwen3_5_moe - mixture-of-experts - apple-silicon - saber - nsc-ace - refusal-ablation - uncensored base_model: GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER base_model_relation: quantized --- # Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed MLX 6-bit build of [`GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER`](https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER) packaged for fast local serving with [`lightning-mlx`](https://github.com/samuelfaj/lightning-mlx). The checkpoint includes an MTPLX sidecar (`mtp.safetensors`) and runtime metadata (`mtplx_runtime.json`) so `lightning-mlx` can use its Qwen3.5 MoE MTPLX serving path on Apple Silicon. Runtime metadata verified on Darwin arm64 with `mtplx_version: 0.1.0rc3`, `mtp_depth_max: 1`, `recommended_profile: sustained`. The base model is GestaltLabs's NSC-ACE-SABER variant of Qwen3.6-35B-A3B (Qwen3.5 MoE, 35B total / ~3B active per token). Refer to the [source model card](https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER) for capabilities, license, NSC-ACE-SABER training, and tool-use evaluation. > **MTP weights**: `mtp.safetensors` is extracted from the F16 GGUF in [`GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP`](https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP) and remapped from llama.cpp `blk.40.*` / `blk.40.nextn.*` tensors back to HF `mtp.*` packed layout. Norms shifted to convert-mtplx's expected deviation form. MTP module is from the NSC-ACE-SABER fine-tune itself, not borrowed from a different base. ## Install lightning-mlx ```bash python3 -m pip install git+https://github.com/samuelfaj/lightning-mlx.git ``` Or: ```bash curl -fsSL https://raw.githubusercontent.com/samuelfaj/lightning-mlx/main/install.sh | bash ``` Verify: ```bash lightning-mlx --help ``` ## Serve this model From Hugging Face: ```bash lightning-mlx serve samuelfaj/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed ``` From a local checkout: ```bash lightning-mlx serve /path/to/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed ``` Daemon mode: ```bash lightning-mlx serve samuelfaj/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed --daemon lightning-mlx status lightning-mlx tui lightning-mlx kill ``` ## OpenAI-compatible API ```bash curl http://localhost:8010/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "local", "messages": [ {"role": "user", "content": "Write a tiny Python HTTP server."} ], "stream": true }' ``` ## Why use lightning-mlx `lightning-mlx` is built for local agent workloads on Apple Silicon: short streamed turns, tool calls, growing context, repeated low-latency interactions. With this checkpoint it uses the packaged MTPLX metadata and Qwen3.5 MoE serving preset instead of treating the model as a generic MLX checkpoint. The runtime focuses on: - OpenAI-compatible local serving - Fast streamed chat completions - Qwen3.5 MoE reasoning and tool-use paths - MTPLX-style speculative decoding support - Daemon, status, TUI, and kill controls ## Convert similar local MTPLX models ```bash lightning-mlx convert-mtplx \ /path/to/Model-MLX-quantized \ --mtp-source /path/to/Model-with-mtp-tensors ``` Output is written next to the source as `-MTPLX-Optimized-Speed`. Then: ```bash lightning-mlx serve /path/to/Model-MLX-quantized-MTPLX-Optimized-Speed ``` ## Use with mlx-lm This checkpoint is also a standard MLX text-generation model: ```bash pip install -U mlx-lm mlx_lm.generate \ --model samuelfaj/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed \ --prompt "Hello" \ --max-tokens 100 ``` ## Intended use Research and red-teaming. SABER ablates refusal behaviors. Deploy behind your own policy/logging layer. ## License Apache 2.0, inherited from the base model.