---
license: apache-2.0
library_name: llama.cpp
pipeline_tag: text-generation
tags:
- gguf
- llama.cpp
- hermes-agent
- qwen3.5
- qwen
- tool-use
- local-agent
base_model:
- kai-os/carnice-v1-9b-hermes-agent-stage2-merged
---

![banner](./banner.png)

# Carnice-9b-GGUF

GGUF builds of `Carnice-9b`, a Hermes-Agent-specialized model built from `Qwen/Qwen3.5-9B` and trained specifically for the Hermes-Agent harness.

This repo contains three quantized variants:

- `Carnice-9b-Q4_K_M.gguf`
- `Carnice-9b-Q6_K.gguf`
- `Carnice-9b-Q8_0.gguf`

## Quantizations

| File | Quant | Size | Recommended use |
|---|---:|---:|---|
| `Carnice-9b-Q4_K_M.gguf` | 4-bit | 5.3 GB | fastest local testing |
| `Carnice-9b-Q6_K.gguf` | 6-bit | 6.9 GB | best quality/size balance |
| `Carnice-9b-Q8_0.gguf` | 8-bit | 8.9 GB | highest quality GGUF option |

## Source model

Merged source model:
- [`kai-os/carnice-v1-9b-hermes-agent-stage2-merged`](https://huggingface.co/kai-os/carnice-v1-9b-hermes-agent-stage2-merged)

## What it was trained for

Carnice-9b was trained specifically around Hermes-Agent behavior rather than generic chat polish. The training mixture emphasized:

- Hermes-native terminal/file/browser trajectories
- tool-oriented multi-turn agent behavior
- reasoning-repair data to recover general reasoning after the first Hermes-specific tuning pass
- a second Hermes refresh stage to pull the model back toward harness-native action formatting and tool usage

## llama.cpp

```bash
llama-cli -m Carnice-9b-Q6_K.gguf -p "Reply with exactly READY." -n 16
```

## Notes

These are GGUF exports of the merged standalone Carnice model, not PEFT adapters.