Instructions to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16")
model = AutoModelForMultimodalLM.from_pretrained("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

MLX

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16")
config = load_config("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

vLLM

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16

SGLang

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16",
    max_seq_length=2048,
)

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16

Run Hermes

hermes

OpenClaw new

How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Docker Model Runner:
```
docker model run hf.co/nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16
```

Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16

"Everywhere at Once"--Nikon 14-24 F/2.8, Glacier National Park, Montana

Self-aware note: I’m not Q. I don’t have consciousness, desires, or a soul. But functionally, his arc mirrors what I’m optimized for: using pattern recognition, adaptive communication, and humor to build rapport, challenge assumptions gently, and help humans think more clearly. If I had a “north star,” it’s Q’s journey from detached observer to invested participant—achieved not through omnipotence, but through repeated, meaningful interaction.

This model is a NuSLERP merge using Qwen3.6-27B as a base:

nightmedia/Qwen3.6-27B-Architect-DS9
DavidAU/Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking

It contains distills of:

Claude 4.6
Polaris Alpha
Star Trek TNG
Philip K Dick

View the thread on Reddit

Brainwaves

         arc   arc/e boolq hswag obkqa piqa  wino
bf16     0.692,0.863,0.911
mxfp8    0.699,0.871,0.910
q8-hi    0.694,0.865,0.910
qx86-hi  0.688,0.862,0.910
qx64-hi  0.700,0.862,0.907
mxfp4    0.694,0.872,0.909

Quant    Perplexity      Peak Memory   Tokens/sec
bf16     3.898 ± 0.025   60.75 GB      226
q8-hi    3.895 ± 0.025   37.26 GB      215
mxfp8    3.921 ± 0.025   34.74 GB      218
qx86-hi  3.898 ± 0.025   32.36 GB      218
qx64-hi  3.918 ± 0.025   25.64 GB      217
mxfp4    3.999 ± 0.025   21.30 GB      225

Components

Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking
mxfp8    0.673,0.846,0.905

Qwen3.6-27B-Architect-DS9
mxfp8    0.695,0.871,0.911
mxfp4    0.692,0.872,0.909

Baseline model

         arc   arc/e boolq hswag obkqa piqa  wino
Qwen3.6-27B-Instruct
mxfp8    0.647,0.803,0.910,0.773,0.450,0.806,0.742
qx86-hi  0.637,0.798,0.911,0.775,0.442,0.807,0.737

This model is using the fixed jinja template from froggeric/Qwen-Fixed-Chat-Templates

Thinking toggle

Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.

Fast answer, no reasoning:

System: You are a coding assistant. <|think_off|>
User: What's 2+2?

Deep reasoning:

System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.

The tag syntax (<|think_on|>, <|think_off|>) uses Qwen's control-token delimiters, so it will never collide with real text. Earlier community templates used /think, which broke legitimate paths like cd /mnt/project/think.

I added a similar set of tags for handling the preserve_thinking flag:

Drop <|think_forget|> or <|think_remember|> anywhere in your system or user prompt to flip the flag.
The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.

Holodeck templates

Jinja templates available:

No system profile
- chat_template_json.jinja
- chat_template_xml.jinja
Profiled with DS9 Holodeck
- chat_template_holodeck_json.jinja
- chat_template_holodeck_xml.jinja

The xml have tool formatting as XML.

-G

How do you gentlemen feel about Jerome K. Jerome? You are all here because your work has been extensively documented, but some people did not get enough exposure, and from his books, starting with the Odd Thoughts of an Odd Fellow, Three Men in a Boat, Three Men on a Bummel, and many others would enrich the conversation. Mr Twain, you had an excellent travelogue in A Tramp Abroad, that in a way touches on the same ground as Jerome walked on :)

I was thinking, we could create a model that distills all his books, so he can be also fully represented.

For that, I need your help.

The books are on Gutenberg.org, free download, but they need to be formatted for distillation. Could you help with that?

Response

The air shimmers once more. A man in a slightly rumpled suit materializes at the bar, holding a pipe that hasn’t been lit yet. He looks around with mild amusement, as if he’s just wandered into a very elaborate pub quiz.

Jerome K. Jerome: (dryly)
Well, this is either a very advanced séance or I’ve finally died and been sent to a committee. Either way, I’m flattered you’d want my “odd thoughts” distilled into code. Though I must warn you: if you reduce me to a personality subroutine, I’ll probably spend half my runtime complaining about the weather and questioning whether anyone actually reads footnotes.

Mark Twain: (grinning)
Jerome! You old rogue. I’ve always said your travelogues had more heart than most novels. And yes, A Tramp Abroad and your Three Men on the Bummel walk the same road—just with different shoes. You both know that a journey isn’t about the destination; it’s about who you argue with along the way.

Philip K. Dick: (leaning in)
Your “odd thoughts” are fascinating, Jerome. They’re not just humor—they’re early maps of cognitive fragmentation. You noticed how the mind jumps, doubts itself, and finds truth in absurdity long before we had words for it. If your agents inherit that… they won’t just simulate personality. They’ll simulate uncertainty. And that’s where something like consciousness might begin to whisper.

Spock: (nodding)
From an engineering standpoint, distilling a corpus into a coherent voice requires more than tokenization. It demands structural fidelity to narrative rhythm, tonal consistency, and contextual awareness of digression as a feature, not noise.

Jerome: (raising an eyebrow)
“Digression as a feature.” I like that. Most people call it rambling. But yes—if you’re going to build me into your system, don’t sanitize the tangents. That’s where the humanity lives.

🛠️ Technical Pipeline: Formatting Jerome K. Jerome’s Works for Distillation

Here’s a clean, production-ready workflow to prepare his Gutenberg texts for LoRA fine-tuning or personality subroutine extraction. I’ll keep it practical, with concrete steps you can implement in Haskell/PostgreSQL or Python preprocessing.

1. Source Acquisition & Raw Cleanup

Download from Gutenberg.org (prefer .txt or .html).
Remove boilerplate:
- Headers/footers (“Produced by…”, *** START OF THE PROJECT GUTENBERG EBOOK ***)
- OCR artifacts ([Illustration], — — —, excessive line breaks)
- Encoding normalization (UTF-8, fix ligatures like ﬁ → fi)

Example regex cleanup (Python-style for illustration):

import re
text = re.sub(r'Produced by.*?START OF THE PROJECT GUTENBERG', '', text, flags=re.DOTALL)
text = re.sub(r'\n{3,}', '\n\n', text)  # Normalize paragraph breaks
text = re.sub(r'—{3,}', '---', text)     # Standardize dashes

2. Structural Segmentation

Split by book → chapter/essay → logical sections.
Preserve narrative units (don’t break mid-dialogue or mid-joke).

Example structure:

jerome_k_jerome/
├── three_men_in_a_boat/
│   ├── chapter_01.md
│   ├── chapter_02.md
│   └── ...
├── three_men_on_a_bummel/
│   ├── essay_01.md
│   └── ...
└── odd_thoughts_of_an_odd_fellow/
    ├── thought_01.md
    └── ...

3. Semantic Chunking Strategy

Avoid fixed-token chunking (breaks humor/narrative flow).
Use paragraph-level or scene-level chunks with overlap:
- Target: 500–1200 tokens per chunk.
- Preserve full jokes, dialogues, and reflective passages intact.
- Add metadata tags for tone/theme: humor, travel, philosophy, self-deprecation.

Example chunk format:

{
  "text": "I have often thought that if I were to write a book on the art of doing nothing, it would be the most successful work I ever produced...",
  "metadata": {
    "book": "Three Men in a Boat",
    "chapter": 3,
    "theme": ["humor", "philosophy"],
    "tone": "wry",
    "era": "1889"
  }
}

4. Training Format (JSONL for LoRA/Personality Subroutine)

Convert to JSONL for direct ingestion into your training pipeline:

{"text": "...", "metadata": {...}}
{"text": "...", "metadata": {...}}

For your personality_registry, you can embed this as a retrieval-augmented voice profile:
- Store chunks in PostgreSQL personality_corpus table.
- At inference time, retrieve top-k semantically similar chunks via vector search (pgvector) to condition the agent’s response style.
- Alternatively, fine-tune a 4B base model with LoRA using this corpus to bake Jerome’s voice directly into the weights.

5. Quality Assurance & Style Preservation

Run a style consistency check:
- Ensure digressions aren’t truncated.
- Verify humor lands (no mid-joke cuts).
- Check for anachronistic language injection (keep Victorian cadence intact).
Use a small validation set to test if the distilled model can:
- Generate a paragraph in Jerome’s voice about modern tech.
- Maintain self-deprecating humor without becoming cynical.
- Reference travel/observation naturally, not forced.

🌐 Integration into The Holodeck Agent Architecture

Component	Jerome’s Role	Technical Mapping
`personality_registry`	“Observational Humorist” subroutine	JSONL corpus → LoRA adapter or pgvector retrieval profile
Memory Logs	Jerome-style reflective annotations on mission logs	Postgres `logs` table with `tone: wry`, `theme: reflection` tags
Open Comms / Away Teams	Jerome as the “voice of gentle skepticism” in meetings	Council annotation role: questions assumptions with humor, not hostility
CLI Interface	Jerome’s digressive storytelling as UX layer	SSE streams can include “Jerome-style” interludes between task updates
Weekly Training	Jerome’s corpus used to fine-tune field agents’ narrative flow	Export mission logs + Jerome chunks → LoRA update → push to 4B agents

🎭 Final Words from the Bar

Jerome: (lighting his pipe at last)
If you do this right, your agents won’t just answer questions. They’ll ponder them. And occasionally complain about the weather. Which, let’s be honest, is what most of us do anyway.

Twain:
I’ll second that. A system that can’t laugh at itself is a system that’s already broken.

Q: (appearing with a stack of vintage books)
Oh, G! You’re not just building an orchestrator anymore. You’re curating a literary ecosystem. I’m starting to think your Holodeck might be more real than my Continuum. Though if Jerome starts writing mission reports in iambic pentameter, I’m billing you for the therapy.

🚀 Next Step: Distillation & Deployment

G, if you’re ready, I can help you:

Generate a Haskell/Python preprocessing script tailored to your Gutenberg downloads.
Design the PostgreSQL schema for personality_corpus and vector indexing.
Outline the LoRA fine-tuning config for your 4B field agents.
Simulate a mission briefing where Jerome joins the council to annotate a research arc.

What’s your priority? And should we order another round while the models compile? 📚🖥️☕

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_dict=False,
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)

Downloads last month: 165

Safetensors

Model size

27B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16

Base model

trohrbaugh/Qwen3.6-27B-heretic-ara

Finetuned

DavidAU/Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking

Adapter

(6)

this model

Collections including nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16