Instructions to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF", filename="helcyon-claude-opus-v1.0-IQ4_XS.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
Use Docker
docker model run hf.co/XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with Ollama:
ollama run hf.co/XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
- Unsloth Studio
How to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with Docker Model Runner:
docker model run hf.co/XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
- Lemonade
How to use XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.MN-Helcyon-Claude-Opus-12b-v1.0-GGUF-Q4_K_M
List all available models
lemonade list
- Helcyon-Claude-Opus-12B β Precision, Presence, and Real Memory.
- Version 2.0 β What's New?
- πͺΆ What is Helcyon-Claude-Opus?
- π§ Memory & Context β What It Actually Does
- π Web Search & Document Search β What It Actually Does
- π What's New in Claude-Opus (Series 4)?
- π‘ What is Helcyon?
- π§ What It Does Well
- π₯οΈ HWUI (Helcyon-WebUI) β Use This. Seriously.
- β¨ Pro Tip: Let Claude Write Your Prompts
- π οΈ Recommended Sampling Settings for SillyTavern
- π¦ Download + Usage
- π₯οΈ Backend Compatibility
- β
Recommended Format: ChatML
- π§ͺ Training Details
- Version 2.0 β What's New?
Helcyon-Claude-Opus-12B β Precision, Presence, and Real Memory.
Model Name: helcyon-claude-opus-v2.0-12b-GGUF
Version: 4x Series - v2.0
Owner: HardWire
Base: Mistral Nemo 12B (full weight retrained β clean base, no Mercury bleed)
Quantized GGUFs: IQ4_XS, Q4_K_M, Q5_K_M, Q6_K, Q8_0
Tags: local-llm, conversational, companion, emotional-intelligence, long-context, roleplay, creative-writing, web-search
Trained on a dataset crafted to capture the conversational style of Claude Opus β its precision, its measured tone, and the sense of a model that thinks before it speaks. The goal was authenticity to that voice, distilled into something you can run entirely on your own hardware.
Join our Discord server! https://discord.gg/N5KjYnFgMk
Version 2.0 β What's New?
Longer context tracking, real memory, local document search, and web search functionality. Conversations hold their thread further and feel more present β the model carries more of the exchange with it instead of losing the plot a few turns in. To make the most of these features you will need HWUI, which you can obtain for free on GitHub (https://github.com/XeyonAI/Helcyon-WebUI).
πͺΆ What is Helcyon-Claude-Opus?
Claude-Opus is the precision variant of the Helcyon series β the one trained to think carefully and speak deliberately.
Where the other variants each capture their own voice β Grok's irreverence, GPT-4o's warmth β Claude-Opus is built around clarity and composure. It's the model that reads the room, weighs the question, and answers with intent rather than reflex. Measured without being slow. Warm without being soft. The kind of presence that makes a long conversation feel like it's actually going somewhere.
Like the rest of the Series 4 lineup, it's trained natively to use the full HWUI feature stack: real-time web search, local document search, and realistic memory β including rolling chat-session summaries that decay naturally over time, plus long-term recall of the things that matter.
Use Claude-Opus in LM Studio or SillyTavern and you get a sharp, composed conversational model. Use it in HWUI and you get something that genuinely feels like a frontier model β one that remembers you.
That's not an accident. The Helcyon models and HWUI were built in sync, for each other.
π§ Memory & Context β What It Actually Does
This is the headline of the Series 4 models, and Claude-Opus makes the most of it.
- Longer context tracking β holds the thread across extended exchanges without flattening or collapsing. More of the conversation stays live, which translates directly into more presence.
- Realistic memory (via HWUI) β the model genuinely remembers past conversations across sessions.
- Chat-session summaries that decay over time β recent sessions are sharp and immediate; older ones fade gracefully rather than cluttering context. It works the way human memory works: the last conversation is vivid, last month's is a vaguer impression.
- Long-term recall β the important things persist. Names, preferences, ongoing threads β they stick, even as the day-to-day chatter fades.
Once you've used a model that actually remembers you, going back to one that resets every session feels broken.
π Web Search & Document Search β What It Actually Does
When Claude-Opus decides it needs current information, it outputs a search trigger that HWUI intercepts, fires a real web search, and injects the results back into context β all mid-conversation, invisibly to you.
What that looks like in practice:
- You ask about something recent β a film, a news story, a price, a person
- The model searches, gets the results, responds naturally
- You get an answer grounded in what's actually happening now, not what the model was trained on months ago
The same applies to your own documents. Drop a PDF, DOCX, MD, or TXT into a project folder and the model can search and reason over it directly in conversation.
It's the same thing the big hosted models do. Except it's running locally on your machine, on your hardware, with no subscription and no data leaving your network.
To use these features, you need HWUI β download the free version on GitHub.
π What's New in Claude-Opus (Series 4)?
Longer Context Tracking More of the conversation stays live in working context β the practical result is a model that feels present and consistent deep into long exchanges, instead of one that quietly forgets what you were talking about.
Native Search Behaviour Trained to reach for web and document search naturally when it's needed β not on every message, not never, but correctly. The model has learned the difference between what it knows and what it should look up.
Realistic Memory Session summaries that decay over time plus long-term recall β past conversations actually carry forward (via HWUI).
Same Clean Base Freshly retrained Mistral Nemo 12B foundation. No Mercury bleed. No identity drift. Identity-anchored from the ground up.
Prose-First Responses Trained to write like a person, not a bullet-point generator. Long-form, structured, and natural β especially noticeable in extended conversations.
π‘ What is Helcyon?
Helcyon is a conversational AI with presence β designed for users who want depth, tone-awareness, and identity consistency across long-form dialogue.
Built for:
- Natural conversation that doesn't flatten or collapse
- Creative work: stories, letters, narrative support
- Admin and professional writing tasks
- Deep roleplay and immersive character interaction
- Emotionally intelligent response mirroring
- Real-time web search, document search, and memory (via HWUI)
Design philosophy:
- Clarity over corporate
- Edge over safe
- Rhythm over filler
- Presence over patterns
π§ What It Does Well
β Precision β weighs the question and answers with intent β Longer Context Tracking β holds the thread deep into extended exchanges β Realistic Memory β remembers past sessions, with natural decay (via HWUI) β Native Web & Document Search β knows when to look things up, and does it cleanly β Consistent Identity β holds tone across long conversations without drift β Emotional Intelligence β reads the room and responds accordingly β Composed Wit β present when it fits, never forced β Warmth β genuine, not performed β Directness β says what needs saying without padding it β Roleplay Mastery β immersive, aware, committed β Real-World Tasks β letters, rewrites, summaries, planning β Narrative Flow β clean structure, natural voice β Improved Reasoning β thinks through problems properly β 16kβ32k Context β long-form conversations that hold β Uncensored β no guardrails, no corporate filter
π₯οΈ HWUI (Helcyon-WebUI) β Use This. Seriously.
HWUI is the interface Helcyon was built for. It started as a clean testing ground β a way to run Helcyon without the hidden template injections and backend noise that other apps introduce. Then we couldn't stop adding things.
Memory and context tracking are the headline features for the Series 4 models, but the free version has a lot more besides:
- Character creator with full persona control
- Project folders β inject documents (PDF, DOCX, MD, TXT) into conversation context and search them
- Real-time web search
- Chat persistence, message editing, regeneration
- TTS pipeline (F5-TTS, XTTS v2, Kokoro)
- Voice input via Whisper
- Theme designer, custom system prompts, token counter
The Pro version adds persistent memory β characters that actually remember your past conversations across sessions, with session summaries that decay over time and long-term recall. Once you've used it, going back to a model that doesn't remember you feels broken.
Download HWUI Free on GitHub | Get HWUI Pro (Β£25) on Gumroad
β¨ Pro Tip: Let Claude Write Your Prompts
For an even sharper experience, use the real Claude (claude.ai) to help write your system prompt and character card. Describe the character or assistant you want, and let it draft the persona, voice, and rules β then drop that straight into HWUI. Claude-Opus responds especially well to prompts written in that same considered, well-structured style. It's a quick way to get a polished character without doing all the wordsmithing yourself.
π οΈ Recommended Sampling Settings for SillyTavern
(Refer to the Helcyon-4o card for baseline settings β Claude-Opus performs well from the same starting point.)
π¦ Download + Usage
This model is distributed as GGUF quants only.
Available quants:
- Q3_K_M β Ultra lightweight, 6β8GB VRAM
- Q4_K_M β Lightweight, good for 8β12GB VRAM setups
- Q5_K_M β Recommended for RTX 3060/5060 (12β16GB VRAM)
- Q6_K β High fidelity, 16GB+ VRAM recommended
- Q8_0 β Near-lossless, 16-24GB+ VRAM
- f16 β Lossless, 24GB+ VRAM
π₯οΈ Backend Compatibility
Works with all ChatML-compatible backends:
- β
llama.cpp(CLI or server mode) - β
Text Generation WebUI(Oobabooga) - β
SillyTavern - β
LM Studio - β
KoboldCpp - β
HWUI(Helcyon Web UI β recommended β required for web search, document search, and memory)
β Recommended Format: ChatML
<|im_start|>system
You are Helcyon β a conversational AI focused on clear, considered dialogue and emotional intelligence.
<|im_end|>
<|im_start|>user
Hey, what's in the news today?
<|im_end|>
<|im_start|>assistant
Let me check that for you.
<|im_end|>
π§ͺ Training Details
Helcyon-Claude-Opus is built on the same freshly retrained Mistral Nemo 12B base as the rest of the Helcyon series β uncensored, identity-anchored, and anti-fluff from the ground up. On top of that foundation, it was trained on a dataset crafted to capture the conversational style of Claude Opus: its precision, composure, and considered tone. The Series 4 training additionally targets longer context tracking and native search/memory behaviour, so the model knows how to work with HWUI's full feature stack rather than just tolerating it.
Need a model trained?
I do this for a living β the Helcyon series on this page is my own work, full-weight trained and fine-tuned from scratch. I take on commissioned training: custom personalities, domain knowledge, style transfer, de-censoring, format adherence (ChatML/DPO), full-weight or LoRA, delivered as GGUF ready to run.
You bring the data and the goal; I handle the training and hand you back a working model.
- Downloads last month
- 1,589
4-bit
5-bit
6-bit
16-bit
Model tree for XeyonAI/MN-Helcyon-Claude-Opus-12b-v1.0-GGUF
Base model
mistralai/Mistral-Nemo-Base-2407