Instructions to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF",
	filename="Qwen3.6-35B-A3B-Heretic-Cerebellum-v1-Q3_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
./llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Use Docker

docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

LM Studio
Jan

vLLM

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Ollama
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Ollama:
```
ollama run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
```

Unsloth Studio

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Docker Model Runner:
```
docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
```

Lemonade

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Run and chat with the model

lemonade run user.Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF-Q3_K_M

List all available models

lemonade list

Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF / benchmark_results /AUDIT_hellaswag.md

deucebucket

results: full audited benchmark suite + adversarial audit reports

9cdbf39 verified 10 days ago

preview code

Raw

History Blame

7.74 kB

HellaSwag Benchmark Audit — heretic-cerebellum-v1

Date: 2026-06-11
Auditor: Adversarial automated audit (adversarial-hellaswag-audit)
Benchmark run timestamp: 2026-06-11 16:02
File audited: heretic-cerebellum-v1_hellaswag_detailed.jsonl
Reported score: 91.78%

Verdict: TRUSTWORTHY

All six check categories pass. The reported score of 91.78% is correct. No artifacts, no fallbacks, no transport failures, no cache contamination detected.

Section 1: Schema Reconnaissance

Fields present in every entry (10042/10042):

Field	Type	Sample values
`context`	str	"A man is sitting on a roof. he"
`endings`	list[str]	4 continuation options
`expected`	str	"D", "C", "B", "A"
`predicted`	str	"D", "C", "B", "A"
`raw_response`	str	"D" (exactly 1 char, always)
`correct`	bool	true / false
`error`	null	always null

No timestamp field. Entries are in sequential dataset order.

raw_response length distribution: All 10042 entries have len(raw_response) == 1. The model returned a single letter for every question with no verbose output to parse.

Section 2: Accuracy Recount

Three independent counting methods were applied:

Method	Correct	Total	Accuracy
`correct == True` field	9217	10042	91.7845%
`predicted == expected` comparison	9217	10042	91.7845%
Reported in results.json	—	10042	91.78%

Delta from reported: 0.0045% — within single-decimal rounding. The reported 91.78% is accurate.

Zero disagreements between the correct boolean field and the predicted == expected computed result across all 10042 entries. The bookkeeping is internally consistent.

Section 3: Empty / Garbage Response Audit

All 10042 entries were scanned.

Check	Count	Status
Empty or whitespace `raw_response`	0	CLEAN
Non-null `error` field	0	CLEAN
Predicted letter absent from `raw_response` (silent parse-fallback signature)	0	CLEAN

The historical "empty-response fallback to option A" bug that cost 108 entries in a prior run is not present here. Every entry has a single-char raw response matching the predicted answer exactly.

Section 4: Wrong-Answer Audit

Total wrong entries: 825
Sample: 80 entries (random.seed(42))

Classification	Count	Pct of sample
REAL (genuine model error)	80	100%
ARTIFACT (script/transport bug)	0	0%

3 Representative REAL examples

Entry #4663

Context: [header] How to beat a "tough" person in a fight [title] Make the first move...
Model predicted: A | Gold: B
Raw response: 'A'
Assessment: Model picked a plausible but wrong continuation.

Entry #618

Context: People are riding camels in a desert area. two individuals that are leading the...
Model predicted: C | Gold: D
Raw response: 'C'
Assessment: Semantically close wrong answer; genuine comprehension error.

Entry #130

Context: A shot of a cyclist is shown and then it cuts back to the same man in white spea...
Model predicted: C | Gold: B
Raw response: 'C'
Assessment: Genuine model error on an ambiguous video-description question.

No artifacts found in the 80-entry sample. All wrong answers show single-letter responses that are plausible wrong choices from the option set.

Section 5: Answer Distribution

No fallback-to-first-option bias detected. Expected ~25% per option; all within normal variance.

Model picks

Option	Count	Percentage	Flag
A	2492	24.82%	—
B	2403	23.93%	—
C	2543	25.32%	—
D	2604	25.93%	—

Gold distribution

Option	Count	Percentage
A	2515	25.04%
B	2485	24.75%
C	2584	25.73%
D	2458	24.48%

The 35% threshold was not breached by any option. The model's answer distribution closely tracks the gold distribution, which is the expected signature of a well-calibrated model rather than a fallback pattern.

Section 6: Timing / Contiguous Failure Check

No timestamp field is present in the JSONL; sequential position is the only ordering available.

Metric	Value
Longest consecutive wrong-answer streak	5 (entries #967–971)
Any streak >= 10 (transport-failure threshold)	None

The streak-of-5 was manually inspected. All five entries show valid single-letter predictions for distinct questions with legitimate wrong but plausible answers:

Entry 967: predicted B, gold D (newscast/gymnastics)
Entry 968: predicted D, gold B (same context topic, different question)
Entry 969: predicted C, gold B (toothbrush/bathroom)
Entry 970: predicted A, gold B (harmonica player)
Entry 971: predicted B, gold C (skateboarding)

No shared context or repeated endings — these are five independent questions. The streak is noise, not a transport stall.

Section 7: Meta / Cache Verification

Contents of heretic-cerebellum-v1_meta.json:

{
  "model_size": "unknown",
  "model_name": "heretic-cerebellum-v1",
  "port": 7890
}

Observations:

model_name matches the filename prefix (heretic-cerebellum-v1). No cache contamination from a different model identity.
model_size: "unknown" is incomplete but not a contradiction — the GGUF filename would be the authoritative source.
port: 7890 is consistent with a dedicated bench server (not the default 8080). This port is distinct from the production inference server (7800), which is correct practice.
No model path or SHA fingerprint is stored, so hardware-level fingerprint verification is not possible from this file alone. This is a metadata weakness but does not contradict the run data.

Cache contamination check: The JSONL filename, results JSON, and meta JSON all reference heretic-cerebellum-v1 consistently. The run timestamp (16:02) matches the JSONL mtime. No evidence of a stale cache from a different model.

Summary Table

Check	Result	Detail
Entry count	PASS	10042 entries, 0 parse errors
Accuracy recount	PASS	91.7845% computed vs 91.78% reported (delta 0.0045%)
`correct` field vs computed	PASS	0 disagreements across all 10042 entries
Empty raw_response	PASS	0 empty entries
Error field set	PASS	0 error-flagged entries
Silent parse fallback	PASS	0 entries where letter absent from raw
Wrong-answer artifacts (80-sample)	PASS	0 artifacts / 80 REAL
Answer distribution — model	PASS	A:24.82% B:23.93% C:25.32% D:25.93% (max 25.93%, below 35% threshold)
Answer distribution — gold	PASS	Uniform, no anomalies
Contiguous wrong streak	PASS	Max streak = 5, below 10-entry threshold
Meta model identity	PASS	model_name matches filename
Cache contamination	PASS	No cross-model fingerprint mismatch

Final Verdict

TRUSTWORTHY

Recount accuracy: 91.7845% (rounds to 91.78%) — matches reported score exactly
Empty response count: 0
Artifact count in 80-entry wrong-answer sample: 0
Cache contamination flag: None
Corrected score: Not needed — reported score is accurate

The score can be published as-is. The only metadata gap is the absence of a model path or SHA in meta.json; future runs should record the GGUF path for traceability. This is a bookkeeping recommendation, not a validity concern.