How to use from
Hermes Agent
Start the llama.cpp server
# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf meshllm/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL-layers
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default meshllm/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL-layers
Run Hermes
hermes
Quick Links
Mesh LLM

Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL across a local Mesh LLM cluster.

This package is derived from unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally UD-Q4_K_XL layer package

Model Overview

Property Value
Source model unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
Model id unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF:UD-Q4_K_XL
Family Devstral
Parameter scale 24B
Quantization UD-Q4_K_XL
Layer count 40
Activation width 5120
Package size 13.8 GB
Source file Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf
Package repo meshllm/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF:UD-Q4_K_XL",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF@main/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf
Source revision main
Source SHA-256 b44e34b78180fc3ab1abbe1edad9f1f3926fdca10eed3bfae168b065e683f6cd
Skippy ABI 0.1.25
Package manifest SHA-256 1527887657160605200ea0d37353c695f1fe7d74b8d310b1de2a6e371102b67f

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums 1527887657160605200ea0d37353c695f1fe7d74b8d310b1de2a6e371102b67f
Metadata shared/metadata.gguf 0 tensors, 8.0 MB 86fec00c5793bcf438dd2d1c5f75f782c79ea84cf6aaa4450bcc2bd146341e68
Embeddings shared/embeddings.gguf 1 tensors, 368.0 MB 5cdea2e2b47b8d751d31151a3f63196b8723fe8588621228b94c126cc9977ea7
Output head shared/output.gguf 2 tensors, 533.0 MB 3c7be3f6156a76320542e283b933580a1988415c2696a3b8b37ad58f5a68be0b
Transformer layers layers/layer-*.gguf 40 layer artifacts, 360 tensors, 13.0 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref main. Each artifact is checksummed as it is written, uploaded to this repository, and removed from the job workspace before the next artifact is produced.

skippy-model-package write-package "/source/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL.gguf" --out-dir "/tmp/meshllm-layer-job-meshllm_Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL-layers-199/package"

Links

Downloads last month
1,817
GGUF
Model size
0.6B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meshllm/Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL-layers