Instructions to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF", filename="Qwen3.6-35B-A3B-Heretic-Cerebellum-v1-Q3_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M # Run inference directly in the terminal: llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M # Run inference directly in the terminal: llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M # Run inference directly in the terminal: ./llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Use Docker
docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
- LM Studio
- Jan
- vLLM
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
- Ollama
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Ollama:
ollama run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
- Unsloth Studio
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF to start chatting
- Pi
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Docker Model Runner:
docker model run hf.co/deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
- Lemonade
How to use deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull deucebucket/Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF:Q3_K_M
Run and chat with the model
lemonade run user.Qwen3.6-35B-A3B-Heretic-Cerebellum-GGUF-Q3_K_M
List all available models
lemonade list
HellaSwag Benchmark Audit β heretic-cerebellum-v1
Date: 2026-06-11
Auditor: Adversarial automated audit (adversarial-hellaswag-audit)
Benchmark run timestamp: 2026-06-11 16:02
File audited: heretic-cerebellum-v1_hellaswag_detailed.jsonl
Reported score: 91.78%
Verdict: TRUSTWORTHY
All six check categories pass. The reported score of 91.78% is correct. No artifacts, no fallbacks, no transport failures, no cache contamination detected.
Section 1: Schema Reconnaissance
Fields present in every entry (10042/10042):
| Field | Type | Sample values |
|---|---|---|
context |
str | "A man is sitting on a roof. he" |
endings |
list[str] | 4 continuation options |
expected |
str | "D", "C", "B", "A" |
predicted |
str | "D", "C", "B", "A" |
raw_response |
str | "D" (exactly 1 char, always) |
correct |
bool | true / false |
error |
null | always null |
No timestamp field. Entries are in sequential dataset order.
raw_response length distribution: All 10042 entries have len(raw_response) == 1. The model returned a single letter for every question with no verbose output to parse.
Section 2: Accuracy Recount
Three independent counting methods were applied:
| Method | Correct | Total | Accuracy |
|---|---|---|---|
correct == True field |
9217 | 10042 | 91.7845% |
predicted == expected comparison |
9217 | 10042 | 91.7845% |
| Reported in results.json | β | 10042 | 91.78% |
Delta from reported: 0.0045% β within single-decimal rounding. The reported 91.78% is accurate.
Zero disagreements between the correct boolean field and the predicted == expected computed result across all 10042 entries. The bookkeeping is internally consistent.
Section 3: Empty / Garbage Response Audit
All 10042 entries were scanned.
| Check | Count | Status |
|---|---|---|
Empty or whitespace raw_response |
0 | CLEAN |
Non-null error field |
0 | CLEAN |
Predicted letter absent from raw_response (silent parse-fallback signature) |
0 | CLEAN |
The historical "empty-response fallback to option A" bug that cost 108 entries in a prior run is not present here. Every entry has a single-char raw response matching the predicted answer exactly.
Section 4: Wrong-Answer Audit
Total wrong entries: 825
Sample: 80 entries (random.seed(42))
| Classification | Count | Pct of sample |
|---|---|---|
| REAL (genuine model error) | 80 | 100% |
| ARTIFACT (script/transport bug) | 0 | 0% |
3 Representative REAL examples
Entry #4663
- Context:
[header] How to beat a "tough" person in a fight [title] Make the first move... - Model predicted: A | Gold: B
- Raw response:
'A' - Assessment: Model picked a plausible but wrong continuation.
Entry #618
- Context:
People are riding camels in a desert area. two individuals that are leading the... - Model predicted: C | Gold: D
- Raw response:
'C' - Assessment: Semantically close wrong answer; genuine comprehension error.
Entry #130
- Context:
A shot of a cyclist is shown and then it cuts back to the same man in white spea... - Model predicted: C | Gold: B
- Raw response:
'C' - Assessment: Genuine model error on an ambiguous video-description question.
No artifacts found in the 80-entry sample. All wrong answers show single-letter responses that are plausible wrong choices from the option set.
Section 5: Answer Distribution
No fallback-to-first-option bias detected. Expected ~25% per option; all within normal variance.
Model picks
| Option | Count | Percentage | Flag |
|---|---|---|---|
| A | 2492 | 24.82% | β |
| B | 2403 | 23.93% | β |
| C | 2543 | 25.32% | β |
| D | 2604 | 25.93% | β |
Gold distribution
| Option | Count | Percentage |
|---|---|---|
| A | 2515 | 25.04% |
| B | 2485 | 24.75% |
| C | 2584 | 25.73% |
| D | 2458 | 24.48% |
The 35% threshold was not breached by any option. The model's answer distribution closely tracks the gold distribution, which is the expected signature of a well-calibrated model rather than a fallback pattern.
Section 6: Timing / Contiguous Failure Check
No timestamp field is present in the JSONL; sequential position is the only ordering available.
| Metric | Value |
|---|---|
| Longest consecutive wrong-answer streak | 5 (entries #967β971) |
| Any streak >= 10 (transport-failure threshold) | None |
The streak-of-5 was manually inspected. All five entries show valid single-letter predictions for distinct questions with legitimate wrong but plausible answers:
- Entry 967: predicted B, gold D (newscast/gymnastics)
- Entry 968: predicted D, gold B (same context topic, different question)
- Entry 969: predicted C, gold B (toothbrush/bathroom)
- Entry 970: predicted A, gold B (harmonica player)
- Entry 971: predicted B, gold C (skateboarding)
No shared context or repeated endings β these are five independent questions. The streak is noise, not a transport stall.
Section 7: Meta / Cache Verification
Contents of heretic-cerebellum-v1_meta.json:
{
"model_size": "unknown",
"model_name": "heretic-cerebellum-v1",
"port": 7890
}
Observations:
model_namematches the filename prefix (heretic-cerebellum-v1). No cache contamination from a different model identity.model_size: "unknown"is incomplete but not a contradiction β the GGUF filename would be the authoritative source.port: 7890is consistent with a dedicated bench server (not the default 8080). This port is distinct from the production inference server (7800), which is correct practice.- No model path or SHA fingerprint is stored, so hardware-level fingerprint verification is not possible from this file alone. This is a metadata weakness but does not contradict the run data.
Cache contamination check: The JSONL filename, results JSON, and meta JSON all reference heretic-cerebellum-v1 consistently. The run timestamp (16:02) matches the JSONL mtime. No evidence of a stale cache from a different model.
Summary Table
| Check | Result | Detail |
|---|---|---|
| Entry count | PASS | 10042 entries, 0 parse errors |
| Accuracy recount | PASS | 91.7845% computed vs 91.78% reported (delta 0.0045%) |
correct field vs computed |
PASS | 0 disagreements across all 10042 entries |
| Empty raw_response | PASS | 0 empty entries |
| Error field set | PASS | 0 error-flagged entries |
| Silent parse fallback | PASS | 0 entries where letter absent from raw |
| Wrong-answer artifacts (80-sample) | PASS | 0 artifacts / 80 REAL |
| Answer distribution β model | PASS | A:24.82% B:23.93% C:25.32% D:25.93% (max 25.93%, below 35% threshold) |
| Answer distribution β gold | PASS | Uniform, no anomalies |
| Contiguous wrong streak | PASS | Max streak = 5, below 10-entry threshold |
| Meta model identity | PASS | model_name matches filename |
| Cache contamination | PASS | No cross-model fingerprint mismatch |
Final Verdict
TRUSTWORTHY
- Recount accuracy: 91.7845% (rounds to 91.78%) β matches reported score exactly
- Empty response count: 0
- Artifact count in 80-entry wrong-answer sample: 0
- Cache contamination flag: None
- Corrected score: Not needed β reported score is accurate
The score can be published as-is. The only metadata gap is the absence of a model path or SHA in meta.json; future runs should record the GGUF path for traceability. This is a bookkeeping recommendation, not a validity concern.