Instructions to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF",
	filename="Mellum2-12B-A2.5B-Instruct-ik-llama-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Use Docker

docker model run hf.co/ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

LM Studio
Jan

vLLM

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Ollama
How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with Ollama:
```
ollama run hf.co/ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16
```

Unsloth Studio

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF to start chatting

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with Docker Model Runner:
```
docker model run hf.co/ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16
```

Lemonade

How to use ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF:BF16

Run and chat with the model

lemonade run user.Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF-BF16

List all available models

lemonade list

Mellum2 12B A2.5B Instruct GGUF for ik_llama

This repository contains GGUF conversions of JetBrains/Mellum2-12B-A2.5B-Instruct.

The files were converted with an ik_llama.cpp branch that adds Mellum2 architecture support and emits the Mellum sliding-window and RoPE/YARN metadata needed by GGUF runtimes.

These files are intended as persistent convenience artifacts for ik_llama reviewers and users. They should also run on current llama.cpp builds that support the Mellum architecture.

No performance or model-quality claims are made here.

Files

File	Type	SHA-256
`Mellum2-12B-A2.5B-Instruct-ik-llama-BF16.gguf`	BF16 reference conversion	`6a322a3f6c59cdd9b4eee3ea678d964572d4b3dc07e52965f235823013d352e0`
`Mellum2-12B-A2.5B-Instruct-ik-llama-Q8_0.gguf`	Q8_0 quantization	`a7db12ebf1e0567927b5a7433dafe98535fd3b75ead9e23f008f1219a6bc90bb`

Provenance

Source model: JetBrains/Mellum2-12B-A2.5B-Instruct
Source snapshot: 4ee5751ef73ac6ae5a65b76b092ffc7c3b9c60e3
Converter/runtime branch: joelfarthing/ik_llama.cpp, branch mellum2-support
PR companion branch URL: https://github.com/joelfarthing/ik_llama.cpp/tree/mellum2-support

The embedded chat template is the stock JetBrains Instruct template:

tokenizer.chat_template sha256 = e674cbec4c384ab50c18c91d8cada3b6931d7a7ee25d9db004366aa440c1ca86

The converted GGUF metadata includes:

mellum.attention.sliding_window = 1024
mellum.attention.sliding_window_pattern
mellum.rope.freq_base = 500000.0
mellum.rope.freq_base_swa = 500000.0
mellum.rope.scaling.type = yarn
mellum.rope.scaling.factor = 16.0
mellum.rope.scaling.original_context_length = 8192
mellum.rope.scaling.yarn_attn_factor = 1.2772588729858398
mellum.rope.scaling.yarn_beta_fast = 32.0
mellum.rope.scaling.yarn_beta_slow = 1.0

Local Validation

The BF16 and Q8_0 files were smoke-tested locally on an RTX 4070 with CUDA server builds.

Validation included:

Q8_0 with ik_llama.cpp CUDA server and --cpu-moe
Q8_0 with current llama.cpp upstream CUDA server and --cpu-moe
BF16 with current llama.cpp upstream CUDA server and --cpu-moe
OpenAI-compatible chat completion request using the embedded chat template
deterministic long-code prompt
python3 -m py_compile on the extracted code
functional topological-sort test including cycle detection

The long-code smoke is a runtime sanity check only. It is not a benchmark and does not imply any quality ranking.

Example

./llama-server \
  -m Mellum2-12B-A2.5B-Instruct-ik-llama-Q8_0.gguf \
  -ngl 99 \
  --cpu-moe \
  -c 4096 \
  -b 512 \
  -ub 512 \
  --jinja

License

The source model card lists the license as Apache-2.0. See the upstream JetBrains model card for the authoritative license and model documentation.

Downloads last month: 111

GGUF

Model size

12B params

Architecture

mellum

Hardware compatibility

8-bit

16-bit

Model tree for ji-farthing/Mellum2-12B-A2.5B-Instruct-ik-llama-GGUF

Base model

JetBrains/Mellum2-12B-A2.5B-Instruct

Quantized

(16)

this model