How to use from
Hermes Agent
Start the MLX server
# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "GiaHuy/Darwin-36B-Opus-mlx-text-only-8bit"
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default GiaHuy/Darwin-36B-Opus-mlx-text-only-8bit
Run Hermes
hermes
Quick Links

Darwin-36B-Opus MLX Text-Only 8-bit

This repository contains a text-only 8-bit MLX conversion of FINAL-Bench/Darwin-36B-Opus.

The model was converted with mlx-lm and is intended for efficient inference on Apple Silicon.

Model Details

  • Original model: FINAL-Bench/Darwin-36B-Opus
  • Format: MLX
  • Quantization: 8-bit
  • Modality: Text-only
  • Runtime: Apple Silicon
  • Recommended server: cubist38/mlx-openai-server

Run with mlx-openai-server

This model is designed to be served through my open-source OpenAI-compatible MLX server:

cubist38/mlx-openai-server

Install the server:

pip install mlx-openai-server

Then launch the model:

mlx-openai-server launch \
  --model-path Darwin-36B-Opus-mlx-text-only-8bit \
  --reasoning-parser qwen3_moe \
  --tool-call-parser qwen3_coder \
  --debug \
  --served-model-name Darwin-36B-Opus

The server exposes an OpenAI-compatible API, making it easy to use with existing OpenAI SDKs, agents, and tools.

Example: OpenAI-Compatible Python Client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed",
)

response = client.chat.completions.create(
    model="Darwin-36B-Opus",
    messages=[
        {
            "role": "user",
            "content": "Explain evolutionary model merging in simple terms.",
        }
    ],
    temperature=0.7,
    max_tokens=512,
)

print(response.choices[0].message.content)

Example: curl

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer not-needed" \
  -d '{
    "model": "Darwin-36B-Opus",
    "messages": [
      {
        "role": "user",
        "content": "What makes Darwin-36B-Opus interesting?"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'

Notes

  • This is a text-only MLX conversion.
  • This is an 8-bit quantized version, so outputs may differ from the original checkpoint.
  • The recommended way to serve this model is through mlx-openai-server.
  • The launch command uses:
    • --reasoning-parser qwen3_moe
    • --tool-call-parser qwen3_coder
    • --served-model-name Darwin-36B-Opus

Attribution

All credit for the original model goes to FINAL-Bench/Darwin-36B-Opus.

This repository provides only an MLX text-only 8-bit conversion for Apple Silicon users.

License

Please refer to the original model repository for licensing and usage terms.

Downloads last month
126
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GiaHuy/Darwin-36B-Opus-mlx-text-only-8bit

Quantized
(9)
this model