How to use from the
Use from the
MLX library
# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("ox-ox/Laguna-M.1-MLX-Q3")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Laguna-M.1-MLX-Q3

A community MLX (Apple Silicon) build of Poolside's Laguna M.1 — a 225B-total / 23B-active Mixture-of-Experts model — quantized to 3-bit so it runs locally on a 128 GB Mac.

Poolside ships official support for vLLM, SGLang, Transformers and TRT-LLM (all CUDA / datacenter). This is the Apple Silicon / edge path.

Not affiliated with Poolside. Weights © Poolside, Inc., released under Apache 2.0. Converted from the public release poolside/Laguna-M.1-FP8 (weights verified byte-identical to the public checkpoint; tokenizer, chat template and generation config taken from the public release).

Use

laguna isn't upstream in mlx-lm yet (PR #1415 pending). Until it lands, install mlx-lm from the branch that ships the model file and the tool-call parser fix (needed for tool use in agentic clients), pinned to a commit:

pip install "git+https://github.com/eauchs/mlx-lm.git@5b0c0667f4a8c25ee9bb9ef729ab64822d1b246e"

mlx_lm.generate --model ox-ox/Laguna-M.1-MLX-Q3 \
  --prompt "Write a Python retry wrapper with exponential backoff." \
  --temp 1.0 --top-k 20

Once the PR is merged, this is just pip install -U mlx-lm + the mlx_lm.generate line — no extra step.

Performance

Quant Hardware Throughput Peak RAM
Q3 Apple M3 Max 128 GB ~26.5 tok/s ~100 GB

Architecture

70 layers (3 dense SwiGLU + 67 sparse MoE), 256 experts + 1 shared, top-k=16 sigmoid routing; global attention 64 Q / 8 KV heads, head_dim 128, QK-norm + softplus output gating; RoPE with YaRN. See the official model card for full details and benchmarks.

License

Apache 2.0 (weights and conversion). © Poolside, Inc. for Laguna M.1.

Citation

Conversion by Théophile Lafargue (@eauchs). Model by Poolside — see their technical report.

Downloads last month
606
Safetensors
Model size
29B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ox-ox/Laguna-M.1-MLX-Q3

Quantized
(12)
this model