Instructions to use ox-ox/Laguna-M.1-MLX-Q3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use ox-ox/Laguna-M.1-MLX-Q3 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("ox-ox/Laguna-M.1-MLX-Q3") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use ox-ox/Laguna-M.1-MLX-Q3 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "ox-ox/Laguna-M.1-MLX-Q3"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ox-ox/Laguna-M.1-MLX-Q3" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ox-ox/Laguna-M.1-MLX-Q3 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "ox-ox/Laguna-M.1-MLX-Q3"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ox-ox/Laguna-M.1-MLX-Q3
Run Hermes
hermes
- MLX LM
How to use ox-ox/Laguna-M.1-MLX-Q3 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "ox-ox/Laguna-M.1-MLX-Q3"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "ox-ox/Laguna-M.1-MLX-Q3" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ox-ox/Laguna-M.1-MLX-Q3", "messages": [ {"role": "user", "content": "Hello"} ] }'
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default ox-ox/Laguna-M.1-MLX-Q3Run Hermes
hermesLaguna-M.1-MLX-Q3
A community MLX (Apple Silicon) build of Poolside's Laguna M.1 — a 225B-total / 23B-active Mixture-of-Experts model — quantized to 3-bit so it runs locally on a 128 GB Mac.
Poolside ships official support for vLLM, SGLang, Transformers and TRT-LLM (all CUDA / datacenter). This is the Apple Silicon / edge path.
Not affiliated with Poolside. Weights © Poolside, Inc., released under Apache 2.0. Converted from the public release
poolside/Laguna-M.1-FP8(weights verified byte-identical to the public checkpoint; tokenizer, chat template and generation config taken from the public release).
Use
laguna isn't upstream in mlx-lm yet
(PR #1415 pending). Until it
lands, install mlx-lm from the branch that ships the model file and the
tool-call parser fix (needed for tool use in agentic clients), pinned to a commit:
pip install "git+https://github.com/eauchs/mlx-lm.git@5b0c0667f4a8c25ee9bb9ef729ab64822d1b246e"
mlx_lm.generate --model ox-ox/Laguna-M.1-MLX-Q3 \
--prompt "Write a Python retry wrapper with exponential backoff." \
--temp 1.0 --top-k 20
Once the PR is merged, this is just pip install -U mlx-lm + the mlx_lm.generate
line — no extra step.
Performance
| Quant | Hardware | Throughput | Peak RAM |
|---|---|---|---|
| Q3 | Apple M3 Max 128 GB | ~26.5 tok/s | ~100 GB |
Architecture
70 layers (3 dense SwiGLU + 67 sparse MoE), 256 experts + 1 shared, top-k=16 sigmoid routing; global attention 64 Q / 8 KV heads, head_dim 128, QK-norm + softplus output gating; RoPE with YaRN. See the official model card for full details and benchmarks.
License
Apache 2.0 (weights and conversion). © Poolside, Inc. for Laguna M.1.
Citation
Conversion by Théophile Lafargue (@eauchs). Model by Poolside — see their technical report.
- Downloads last month
- 606
Quantized
Model tree for ox-ox/Laguna-M.1-MLX-Q3
Base model
poolside/Laguna-M.1
Start the MLX server
# Install MLX LM: uv tool install mlx-lm# Start a local OpenAI-compatible server: mlx_lm.server --model "ox-ox/Laguna-M.1-MLX-Q3"