---
library_name: mlx
license: apache-2.0
license_link: https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER/blob/main/LICENSE
pipeline_tag: text-generation
language:
- en
tags:
- mlx
- lightning-mlx
- mtplx
- qwen3.5
- qwen3_5_moe
- mixture-of-experts
- apple-silicon
- saber
- nsc-ace
- refusal-ablation
- uncensored
base_model: GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER
base_model_relation: quantized
---

# Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed

MLX 6-bit build of [`GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER`](https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER) packaged for fast local serving with [`lightning-mlx`](https://github.com/samuelfaj/lightning-mlx).

The checkpoint includes an MTPLX sidecar (`mtp.safetensors`) and runtime metadata (`mtplx_runtime.json`) so `lightning-mlx` can use its Qwen3.5 MoE MTPLX serving path on Apple Silicon. Runtime metadata verified on Darwin arm64 with `mtplx_version: 0.1.0rc3`, `mtp_depth_max: 1`, `recommended_profile: sustained`.

The base model is GestaltLabs's NSC-ACE-SABER variant of Qwen3.6-35B-A3B (Qwen3.5 MoE, 35B total / ~3B active per token). Refer to the [source model card](https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER) for capabilities, license, NSC-ACE-SABER training, and tool-use evaluation.

> **MTP weights**: `mtp.safetensors` is extracted from the F16 GGUF in [`GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP`](https://huggingface.co/GestaltLabs/Qwen3.6-35B-A3B-NSC-ACE-SABER-GGUF-MTP) and remapped from llama.cpp `blk.40.*` / `blk.40.nextn.*` tensors back to HF `mtp.*` packed layout. Norms shifted to convert-mtplx's expected deviation form. MTP module is from the NSC-ACE-SABER fine-tune itself, not borrowed from a different base.

## Install lightning-mlx

```bash
python3 -m pip install git+https://github.com/samuelfaj/lightning-mlx.git
```

Or:

```bash
curl -fsSL https://raw.githubusercontent.com/samuelfaj/lightning-mlx/main/install.sh | bash
```

Verify:

```bash
lightning-mlx --help
```

## Serve this model

From Hugging Face:

```bash
lightning-mlx serve samuelfaj/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed
```

From a local checkout:

```bash
lightning-mlx serve /path/to/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed
```

Daemon mode:

```bash
lightning-mlx serve samuelfaj/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed --daemon
lightning-mlx status
lightning-mlx tui <PID-or-model-name>
lightning-mlx kill <PID-or-model-name>
```

## OpenAI-compatible API

```bash
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "messages": [
      {"role": "user", "content": "Write a tiny Python HTTP server."}
    ],
    "stream": true
  }'
```

## Why use lightning-mlx

`lightning-mlx` is built for local agent workloads on Apple Silicon: short streamed turns, tool calls, growing context, repeated low-latency interactions. With this checkpoint it uses the packaged MTPLX metadata and Qwen3.5 MoE serving preset instead of treating the model as a generic MLX checkpoint.

The runtime focuses on:

- OpenAI-compatible local serving
- Fast streamed chat completions
- Qwen3.5 MoE reasoning and tool-use paths
- MTPLX-style speculative decoding support
- Daemon, status, TUI, and kill controls

## Convert similar local MTPLX models

```bash
lightning-mlx convert-mtplx \
  /path/to/Model-MLX-quantized \
  --mtp-source /path/to/Model-with-mtp-tensors
```

Output is written next to the source as `<source>-MTPLX-Optimized-Speed`. Then:

```bash
lightning-mlx serve /path/to/Model-MLX-quantized-MTPLX-Optimized-Speed
```

## Use with mlx-lm

This checkpoint is also a standard MLX text-generation model:

```bash
pip install -U mlx-lm
mlx_lm.generate \
  --model samuelfaj/Qwen3.6-35B-A3B-NSC-ACE-SABER-6bit-MTPLX-Optimized-Speed \
  --prompt "Hello" \
  --max-tokens 100
```

## Intended use

Research and red-teaming. SABER ablates refusal behaviors. Deploy behind your own policy/logging layer.

## License

Apache 2.0, inherited from the base model.