Instructions to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF",
	filename="Qwen3.6-35B-A3B-Heretic-Cerebellum-v1-Q3_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
./llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Use Docker

docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

LM Studio
Jan

vLLM

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Ollama
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Ollama:
```
ollama run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
```

Unsloth Studio

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Docker Model Runner:
```
docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
```

Lemonade

How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M

Run and chat with the model

lemonade run user.Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF-Q3_K_M

List all available models

lemonade list

Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF / benchmark_results /AUDIT_arc_mmlu.md

deucebucket

results: full audited benchmark suite + adversarial audit reports

9cdbf39 verified 10 days ago

preview code

Raw

History Blame

6.87 kB

Benchmark Audit: ARC-Challenge + MMLU-Redux

Model: heretic-cerebellum-v1
Auditor: adversarial / automated
Audit date: 2026-06-11
Audited files:

heretic-cerebellum-v1_arc_detailed.jsonl (1172 entries)
heretic-cerebellum-v1_mmlu_redux_detailed.jsonl (2400 entries)

Verdict

Benchmark	Verdict	Reported	Recount	Artifact errors
ARC-Challenge	TRUSTWORTHY	95.48%	95.48%	0
MMLU-Redux	TRUSTWORTHY	75.42%	75.42%	0

No artifacts, no parse failures, no label-format bugs, no truncation signature detected. All 1172 ARC and 2400 MMLU entries are internally consistent.

1. Schema Reconnaissance

Both files use the same schema per line:

{
  "question": "...",
  "choices": ["A text", "B text", "C text", "D text"],
  "expected": "C",
  "predicted": "C",
  "raw_response": "C",
  "correct": true,
  "error": null
}

MMLU adds a "subject" field. The raw_response field stores exactly what the model returned — in every entry across both benchmarks this is a single uppercase letter (A/B/C/D). There are no multi-token completions, no reasoning traces, no chain-of-thought artifacts.

2. Aggregate Verification

Recount performed by independently summing correct == true flags:

Benchmark	Total	Correct	Wrong	Recount acc	Reported acc	Match
ARC	1172	1119	53	95.48%	95.48%	YES
MMLU	2400	1810	590	75.42%	75.42%	YES

Both match to 2 decimal places. The summary JSONs are not lying.

3. Wrong-Answer Classification

ARC-Challenge: all 53 wrong entries

Class	Count
REAL_ERROR (model chose wrong letter)	53
ARTIFACT_EMPTY	0
ARTIFACT_UNPARSEABLE	0
ARTIFACT_PARSE_MISMATCH	0
ARTIFACT_NUMERIC_LABEL	0

0 artifacts out of 53 wrong answers.

MMLU-Redux: 60-entry random sample (seed=42) of 590 wrong entries

Class	Count
REAL_ERROR (model chose wrong letter)	60
ARTIFACT_EMPTY	0
ARTIFACT_UNPARSEABLE	0
ARTIFACT_PARSE_MISMATCH	0
ARTIFACT_NUMERIC_LABEL	0

0 artifacts out of 60 sampled wrong answers. At 0/60 artifact rate, the 95% CI for artifact prevalence in the full wrong population is 0–6% (Wilson interval). The most pessimistic reading: ~35 of the 590 wrong answers could be artifacts; even so the corrected score would be 75.42% + (35/2400)*100 = ~76.9%. The floor of the score cannot drop below reported.

4. Distribution Checks

4a. Choice distribution (predicted vs gold)

ARC:

Choice	Predicted	Gold	Delta
A	269	266	+3
B	312	311	+1
C	304	310	-6
D	287	285	+2

Deltas are ≤6. No evidence of parser defaulting to any single choice.

MMLU:

Choice	Predicted	Gold	Delta
A	497	537	-40
B	614	600	+14
C	613	606	+7
D	676	657	+19

The model under-picks A and over-picks D relative to gold distribution. This is a model-level tendency, not a parser artifact — A-defaulting (the known parser-fallback bug) would produce the opposite signature (over-picking A).

4b. Empty/whitespace raw responses across ALL entries

ARC: 0 / 1172
MMLU: 0 / 2400

No empty responses anywhere.

4c. Parsed choice absent from raw_response (all entries with long responses)

All 3572 entries have single-character raw responses. The parsed predicted field equals raw_response in 100% of entries (0 mismatches in either benchmark).

4d. `correct` flag internal consistency

ARC entries where correct=True but predicted != expected: 0
MMLU entries where correct=True but predicted != expected: 0
ARC entries where correct=False but predicted == expected: 0
MMLU entries where correct=False but predicted == expected: 0

The correct flag is computed correctly from predicted == expected with no exceptions.

4e. First-option bias among wrong answers

ARC wrong answers predicted as 'A': 12/53 = 22.6% (expected if random: 25%)
MMLU wrong answers predicted as 'A': 112/590 = 19.0% (expected if random: 25%)

If anything, the model slightly under-picks 'A' when wrong — no first-option parser bias.

5. Truncation Analysis (MMLU)

Prompt length was approximated as len(question) + sum(len(choice) for choice in choices). This is a proxy for the actual tokenized prompt, but faithfully captures long-vs-short relative ordering.

Wrong rate by prompt-length decile:

Decile	Len range (chars)	Wrong rate
1 (shortest)	17–101	24.2%
2	101–129	23.8%
3	129–154	29.2%
4	154–184	25.4%
5	184–219	25.4%
6	219–259	22.1%
7	259–315	25.0%
8	316–382	25.0%
9	382–489	25.0%
10 (longest)	489–4872	20.8%

No truncation signature. The longest decile (489–4872 chars, including a 4872-char outlier) has the lowest wrong rate (20.8%), not the highest. If context truncation were occurring, deciles 9–10 would show elevated error rates. The distribution is flat across deciles, with decile 3 as the minor high point (29.2%) — almost certainly subject-difficulty driven, not length-driven.

Mean prompt length of correct vs wrong answers:

Correct: 277 chars
Wrong: 259 chars

Wrong answers are marginally shorter in prompt length on average, the opposite of what truncation would produce.

ARC truncation check:

Wrong answers mean prompt length (217) < correct answers (249). Same anti-truncation pattern.

No context-per-slot truncation artifacts in either benchmark.

6. Known Historical Bug Cross-Check

Bug	Check	Status
Numeric-vs-letter label mismatch (cost 19 ARC questions)	ARTIFACT_NUMERIC_LABEL count	0 in both
Empty responses counted as wrong	empty raw_response	0 in both
Parser fallback picking first option (A-bias)	wrong-answer A% vs expected 25%	ARC 22.6%, MMLU 19.0% — no bias
API errors counted as wrong	`error` field non-null	0 in both
Context-per-slot truncation	prompt-length decile wrong rate	Flat; longest decile lowest error

All five known historical bugs: not present.

7. Corrected Scores

No correction needed. Recount matches reported scores exactly. Zero artifact errors detected in all sampled and exhaustively audited wrong answers.

Final scores:

ARC-Challenge: 95.48% (1119/1172) — confirmed
MMLU-Redux: 75.42% (1810/2400) — confirmed