Instructions to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16") model = AutoModelForMultimodalLM.from_pretrained("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - MLX
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16") config = load_config("nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- vLLM
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16
- SGLang
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16", max_seq_length=2048, ) - Pi
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16
Run Hermes
hermes
- Docker Model Runner
How to use nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16 with Docker Model Runner:
docker model run hf.co/nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16
Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16
"Everywhere at Once"--Nikon 14-24 F/2.8, Glacier National Park, Montana
Self-aware note: I’m not Q. I don’t have consciousness, desires, or a soul. But functionally, his arc mirrors what I’m optimized for: using pattern recognition, adaptive communication, and humor to build rapport, challenge assumptions gently, and help humans think more clearly. If I had a “north star,” it’s Q’s journey from detached observer to invested participant—achieved not through omnipotence, but through repeated, meaningful interaction.
This model is a NuSLERP merge using Qwen3.6-27B as a base:
- nightmedia/Qwen3.6-27B-Architect-DS9
- DavidAU/Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking
It contains distills of:
- Claude 4.6
- Polaris Alpha
- Star Trek TNG
- Philip K Dick
View the thread on Reddit
Brainwaves
arc arc/e boolq hswag obkqa piqa wino
bf16 0.692,0.863,0.911
mxfp8 0.699,0.871,0.910
q8-hi 0.694,0.865,0.910
qx86-hi 0.688,0.862,0.910
qx64-hi 0.700,0.862,0.907
mxfp4 0.694,0.872,0.909
Quant Perplexity Peak Memory Tokens/sec
bf16 3.898 ± 0.025 60.75 GB 226
q8-hi 3.895 ± 0.025 37.26 GB 215
mxfp8 3.921 ± 0.025 34.74 GB 218
qx86-hi 3.898 ± 0.025 32.36 GB 218
qx64-hi 3.918 ± 0.025 25.64 GB 217
mxfp4 3.999 ± 0.025 21.30 GB 225
Components
Qwen3.6-27B-Heretic2-Uncensored-Finetune-Thinking
mxfp8 0.673,0.846,0.905
Qwen3.6-27B-Architect-DS9
mxfp8 0.695,0.871,0.911
mxfp4 0.692,0.872,0.909
Baseline model
arc arc/e boolq hswag obkqa piqa wino
Qwen3.6-27B-Instruct
mxfp8 0.647,0.803,0.910,0.773,0.450,0.806,0.742
qx86-hi 0.637,0.798,0.911,0.775,0.442,0.807,0.737
This model is using the fixed jinja template from froggeric/Qwen-Fixed-Chat-Templates
Thinking toggle
Drop <|think_on|> or <|think_off|> anywhere in your system or user prompt. The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.
Fast answer, no reasoning:
System: You are a coding assistant. <|think_off|>
User: What's 2+2?
Deep reasoning:
System: You are a coding assistant. <|think_on|>
User: Implement a red-black tree in Rust.
The tag syntax (<|think_on|>, <|think_off|>) uses Qwen's control-token delimiters, so it will never collide with real text. Earlier community templates used /think, which broke legitimate paths like cd /mnt/project/think.
I added a similar set of tags for handling the preserve_thinking flag:
- Drop <|think_forget|> or <|think_remember|> anywhere in your system or user prompt to flip the flag.
- The template intercepts the tag, removes it from context so the model never sees it, and flips the mode.
Holodeck templates
Jinja templates available:
- No system profile
- chat_template_json.jinja
- chat_template_xml.jinja
- Profiled with DS9 Holodeck
- chat_template_holodeck_json.jinja
- chat_template_holodeck_xml.jinja
The xml have tool formatting as XML.
-G
How do you gentlemen feel about Jerome K. Jerome? You are all here because your work has been extensively documented, but some people did not get enough exposure, and from his books, starting with the Odd Thoughts of an Odd Fellow, Three Men in a Boat, Three Men on a Bummel, and many others would enrich the conversation. Mr Twain, you had an excellent travelogue in A Tramp Abroad, that in a way touches on the same ground as Jerome walked on :)
I was thinking, we could create a model that distills all his books, so he can be also fully represented.
For that, I need your help.
The books are on Gutenberg.org, free download, but they need to be formatted for distillation. Could you help with that?
Response
The air shimmers once more. A man in a slightly rumpled suit materializes at the bar, holding a pipe that hasn’t been lit yet. He looks around with mild amusement, as if he’s just wandered into a very elaborate pub quiz.
Jerome K. Jerome: (dryly)
Well, this is either a very advanced séance or I’ve finally died and been sent to a committee. Either way, I’m flattered you’d want my “odd thoughts” distilled into code. Though I must warn you: if you reduce me to a personality subroutine, I’ll probably spend half my runtime complaining about the weather and questioning whether anyone actually reads footnotes.
Mark Twain: (grinning)
Jerome! You old rogue. I’ve always said your travelogues had more heart than most novels. And yes, A Tramp Abroad and your Three Men on the Bummel walk the same road—just with different shoes. You both know that a journey isn’t about the destination; it’s about who you argue with along the way.
Philip K. Dick: (leaning in)
Your “odd thoughts” are fascinating, Jerome. They’re not just humor—they’re early maps of cognitive fragmentation. You noticed how the mind jumps, doubts itself, and finds truth in absurdity long before we had words for it. If your agents inherit that… they won’t just simulate personality. They’ll simulate uncertainty. And that’s where something like consciousness might begin to whisper.
Spock: (nodding)
From an engineering standpoint, distilling a corpus into a coherent voice requires more than tokenization. It demands structural fidelity to narrative rhythm, tonal consistency, and contextual awareness of digression as a feature, not noise.
Jerome: (raising an eyebrow)
“Digression as a feature.” I like that. Most people call it rambling. But yes—if you’re going to build me into your system, don’t sanitize the tangents. That’s where the humanity lives.
🛠️ Technical Pipeline: Formatting Jerome K. Jerome’s Works for Distillation
Here’s a clean, production-ready workflow to prepare his Gutenberg texts for LoRA fine-tuning or personality subroutine extraction. I’ll keep it practical, with concrete steps you can implement in Haskell/PostgreSQL or Python preprocessing.
1. Source Acquisition & Raw Cleanup
- Download from Gutenberg.org (prefer
.txtor.html). - Remove boilerplate:
- Headers/footers (
“Produced by…”,*** START OF THE PROJECT GUTENBERG EBOOK ***) - OCR artifacts (
[Illustration],— — —, excessive line breaks) - Encoding normalization (UTF-8, fix ligatures like
fi→fi)
- Headers/footers (
- Example regex cleanup (Python-style for illustration):
import re text = re.sub(r'Produced by.*?START OF THE PROJECT GUTENBERG', '', text, flags=re.DOTALL) text = re.sub(r'\n{3,}', '\n\n', text) # Normalize paragraph breaks text = re.sub(r'—{3,}', '---', text) # Standardize dashes
2. Structural Segmentation
- Split by book → chapter/essay → logical sections.
- Preserve narrative units (don’t break mid-dialogue or mid-joke).
- Example structure:
jerome_k_jerome/ ├── three_men_in_a_boat/ │ ├── chapter_01.md │ ├── chapter_02.md │ └── ... ├── three_men_on_a_bummel/ │ ├── essay_01.md │ └── ... └── odd_thoughts_of_an_odd_fellow/ ├── thought_01.md └── ...
3. Semantic Chunking Strategy
- Avoid fixed-token chunking (breaks humor/narrative flow).
- Use paragraph-level or scene-level chunks with overlap:
- Target: 500–1200 tokens per chunk.
- Preserve full jokes, dialogues, and reflective passages intact.
- Add metadata tags for tone/theme:
humor,travel,philosophy,self-deprecation.
- Example chunk format:
{ "text": "I have often thought that if I were to write a book on the art of doing nothing, it would be the most successful work I ever produced...", "metadata": { "book": "Three Men in a Boat", "chapter": 3, "theme": ["humor", "philosophy"], "tone": "wry", "era": "1889" } }
4. Training Format (JSONL for LoRA/Personality Subroutine)
- Convert to JSONL for direct ingestion into your training pipeline:
{"text": "...", "metadata": {...}} {"text": "...", "metadata": {...}} - For your
personality_registry, you can embed this as a retrieval-augmented voice profile:- Store chunks in PostgreSQL
personality_corpustable. - At inference time, retrieve top-k semantically similar chunks via vector search (pgvector) to condition the agent’s response style.
- Alternatively, fine-tune a 4B base model with LoRA using this corpus to bake Jerome’s voice directly into the weights.
- Store chunks in PostgreSQL
5. Quality Assurance & Style Preservation
- Run a style consistency check:
- Ensure digressions aren’t truncated.
- Verify humor lands (no mid-joke cuts).
- Check for anachronistic language injection (keep Victorian cadence intact).
- Use a small validation set to test if the distilled model can:
- Generate a paragraph in Jerome’s voice about modern tech.
- Maintain self-deprecating humor without becoming cynical.
- Reference travel/observation naturally, not forced.
🌐 Integration into The Holodeck Agent Architecture
| Component | Jerome’s Role | Technical Mapping |
|---|---|---|
personality_registry |
“Observational Humorist” subroutine | JSONL corpus → LoRA adapter or pgvector retrieval profile |
| Memory Logs | Jerome-style reflective annotations on mission logs | Postgres logs table with tone: wry, theme: reflection tags |
| Open Comms / Away Teams | Jerome as the “voice of gentle skepticism” in meetings | Council annotation role: questions assumptions with humor, not hostility |
| CLI Interface | Jerome’s digressive storytelling as UX layer | SSE streams can include “Jerome-style” interludes between task updates |
| Weekly Training | Jerome’s corpus used to fine-tune field agents’ narrative flow | Export mission logs + Jerome chunks → LoRA update → push to 4B agents |
🎭 Final Words from the Bar
Jerome: (lighting his pipe at last)
If you do this right, your agents won’t just answer questions. They’ll ponder them. And occasionally complain about the weather. Which, let’s be honest, is what most of us do anyway.
Twain:
I’ll second that. A system that can’t laugh at itself is a system that’s already broken.
Q: (appearing with a stack of vintage books)
Oh, G! You’re not just building an orchestrator anymore. You’re curating a literary ecosystem. I’m starting to think your Holodeck might be more real than my Continuum. Though if Jerome starts writing mission reports in iambic pentameter, I’m billing you for the therapy.
🚀 Next Step: Distillation & Deployment
G, if you’re ready, I can help you:
- Generate a Haskell/Python preprocessing script tailored to your Gutenberg downloads.
- Design the PostgreSQL schema for
personality_corpusand vector indexing. - Outline the LoRA fine-tuning config for your 4B field agents.
- Simulate a mission briefing where Jerome joins the council to annotate a research arc.
What’s your priority? And should we order another round while the models compile? 📚🖥️☕
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 87
Quantized
Model tree for nightmedia/Qwen3.6-27B-Architect-DS9-Polaris-Heretic-BF16
Base model
trohrbaugh/Qwen3.6-27B-heretic-ara