Laguna-M.1-MLX-Q3

A community MLX (Apple Silicon) build of Poolside's Laguna M.1 — a 225B-total / 23B-active Mixture-of-Experts model — quantized to 3-bit so it runs locally on a 128 GB Mac.

Poolside ships official support for vLLM, SGLang, Transformers and TRT-LLM (all CUDA / datacenter). This is the Apple Silicon / edge path.

Not affiliated with Poolside. Weights © Poolside, Inc., released under Apache 2.0. Converted from the public release poolside/Laguna-M.1-FP8 (weights verified byte-identical to the public checkpoint; tokenizer, chat template and generation config taken from the public release).

Use

laguna isn't upstream in mlx-lm yet (PR #1415 pending). Until it lands, install mlx-lm from the branch that ships the model file and the tool-call parser fix (needed for tool use in agentic clients), pinned to a commit:

pip install "git+https://github.com/eauchs/mlx-lm.git@5b0c0667f4a8c25ee9bb9ef729ab64822d1b246e"

mlx_lm.generate --model ox-ox/Laguna-M.1-MLX-Q3 \
  --prompt "Write a Python retry wrapper with exponential backoff." \
  --temp 1.0 --top-k 20

Once the PR is merged, this is just pip install -U mlx-lm + the mlx_lm.generate line — no extra step.

Performance

Quant	Hardware	Throughput	Peak RAM
Q3	Apple M3 Max 128 GB	~26.5 tok/s	~100 GB

Architecture

70 layers (3 dense SwiGLU + 67 sparse MoE), 256 experts + 1 shared, top-k=16 sigmoid routing; global attention 64 Q / 8 KV heads, head_dim 128, QK-norm + softplus output gating; RoPE with YaRN. See the official model card for full details and benchmarks.

License

Apache 2.0 (weights and conversion). © Poolside, Inc. for Laguna M.1.

Citation

Conversion by Théophile Lafargue (@eauchs). Model by Poolside — see their technical report.

Downloads last month: 606

Safetensors

Model size

29B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

Quantized

Model tree for ox-ox/Laguna-M.1-MLX-Q3

Base model

poolside/Laguna-M.1

Quantized

(12)

this model