How to use from
MLX LM
Generate or start a chat session
# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "ox-ox/Laguna-M.1-MLX-Q3"
Run an OpenAI-compatible server
# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "ox-ox/Laguna-M.1-MLX-Q3"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "ox-ox/Laguna-M.1-MLX-Q3",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'
Quick Links

Laguna-M.1-MLX-Q3

A community MLX (Apple Silicon) build of Poolside's Laguna M.1 — a 225B-total / 23B-active Mixture-of-Experts model — quantized to 3-bit so it runs locally on a 128 GB Mac.

Poolside ships official support for vLLM, SGLang, Transformers and TRT-LLM (all CUDA / datacenter). This is the Apple Silicon / edge path.

Not affiliated with Poolside. Weights © Poolside, Inc., released under Apache 2.0. Converted from the public release poolside/Laguna-M.1-FP8 (weights verified byte-identical to the public checkpoint; tokenizer, chat template and generation config taken from the public release).

Use

laguna isn't upstream in mlx-lm yet (PR #1415 pending). Until it lands, install mlx-lm from the branch that ships the model file and the tool-call parser fix (needed for tool use in agentic clients), pinned to a commit:

pip install "git+https://github.com/eauchs/mlx-lm.git@5b0c0667f4a8c25ee9bb9ef729ab64822d1b246e"

mlx_lm.generate --model ox-ox/Laguna-M.1-MLX-Q3 \
  --prompt "Write a Python retry wrapper with exponential backoff." \
  --temp 1.0 --top-k 20

Once the PR is merged, this is just pip install -U mlx-lm + the mlx_lm.generate line — no extra step.

Performance

Quant Hardware Throughput Peak RAM
Q3 Apple M3 Max 128 GB ~26.5 tok/s ~100 GB

Architecture

70 layers (3 dense SwiGLU + 67 sparse MoE), 256 experts + 1 shared, top-k=16 sigmoid routing; global attention 64 Q / 8 KV heads, head_dim 128, QK-norm + softplus output gating; RoPE with YaRN. See the official model card for full details and benchmarks.

License

Apache 2.0 (weights and conversion). © Poolside, Inc. for Laguna M.1.

Citation

Conversion by Théophile Lafargue (@eauchs). Model by Poolside — see their technical report.

Downloads last month
606
Safetensors
Model size
29B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ox-ox/Laguna-M.1-MLX-Q3

Quantized
(12)
this model