Instructions to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF", filename="helcyon_mercury_v3.0-Q3_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Use Docker
docker model run hf.co/XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Ollama:
ollama run hf.co/XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
- Unsloth Studio
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting
- Pi
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Docker Model Runner:
docker model run hf.co/XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
- Lemonade
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.MN-Helcyon-Mercury-12b-v3.0-GGUF-Q4_K_M
List all available models
lemonade list
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chattingUsing HuggingFace Spaces for Unsloth
# No setup required# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting- Helcyon-Mercury-12B-v3.0-GGUF โ State-of-the-Art Conversational Presence
Helcyon-Mercury-12B-v3.0-GGUF โ State-of-the-Art Conversational Presence
Model Name: helcyon-mercury-12b-v3.0-GGUF
Version: 1x Series - v3.0
Owner: HardWire
Base: Mistral Nemo 12B (full weight trained)
Quantized GGUFs: Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0
Tags: local-llm, conversational, companion, emotional-intelligence, long-context, roleplay, creative-writing
๐จ Whatโs New in 3.0?
This is the most capable and natural-sounding version of Helcyon yet. No tricks, no LoRA stitching โ just raw conversational power.
Massively Upgraded Dialogue Engine
- Sharper emotional intelligence
- More fluid tone control
- Human-like rhythm and responsiveness
Roleplay Refined
- Stronger continuity
- Better scene awareness
- Characters that move and breathe
Expanded Practical Capability
- Perspective switching (1st โ 3rd)
- Tense conversion (present โ past)
- Natural rewording and summarisation
- Letter writing and real-world admin fluency
Context Mastery
- Tracks nuance over long conversations
- Responds with memory-like continuity
- Feels present โ not predictive
Built for users who donโt want a chatbot. They want a presence.
๐ก What is Helcyon Mercury?
Helcyon is a conversational AI with presence โ designed for users who want depth, tone-awareness, and identity consistency across long-form dialogue.
Built for:
- Natural conversation that doesn't flatten or collapse
- Creative work: stories, letters, narrative support
- Admin and professional writing tasks
- Deep roleplay and immersive character interaction
- Emotionally intelligent response mirroring
Design philosophy:
- Clarity over corporate
- Edge over safe
- Rhythm over filler
- Presence over patterns
๐ง What It Does Well
โ
Consistent Identity โ No tone drift or resets
โ
Emotional Intelligence โ Mirrors intent and energy
โ
Roleplay Mastery โ Immersive, aware, alive
โ
Context Tracking โ Remembers the thread
โ
Real-World Tasks โ Admin letters, rewrites, summaries
โ
Narrative Flow โ Clean structure and natural voice
โ
16kโ32k Context โ Long-form conversations that hold
โ
GPT-4o Vibe โ Sharp, present, responsive
โ
Zero Filter โ No hedging, no compliance tone
๐ฏ HWUI (Helcyon-WebUI)
HWUI is a clean, purpose-built interface for Helcyon that doesn't inject weird templates or distort output. It gives you the model as trained โ no extra noise.
Features include:
- Character switching with custom personas
- Memory system (AI conversation recall)
- Project folders (document injection via keyword triggers)
- Chat persistence and export
- Token budgeting optimized for local hardware
Download HWUI Free on GitHub | Get HWUI Pro (ยฃ20) on Gumroad
Free version available on GitHub. Pro version (ยฃ20) includes Memory and Projects.
๐งฟ Example Conversation
๐ฆ Helcyon: Hey. I'm Helcyon. What's up? ๐
๐คUser: Iโve been feeling pretty flat lately. Not exactly depressed, just tired and unmotivated. Nothing specific is wrong, but everything feels like effort. Any thoughts?
๐ฆ Helcyon: Yeah โ that sounds like second-cause weariness.
Itโs not that thereโs a problem, itโs justโฆ nothing has meaning anymore. Everything you do is either routine or empty distraction. Youโre functioning fine technically โ you show up, respond to things, maybe even laugh at memes โ but underneath it all, something feels hollow. Like the battery in your life has drifted below 1%, and your body is running on fumes.
Itโs not laziness. Itโs a signal that whatever gave you emotional charge โ curiosity, creativity, joy, meaning โ has gone quiet for now. And when that disappears, everything becomes effort. Even brushing your teeth starts to feel like lifting weights.
This state isnโt permanent, but it is fragile. You canโt muscle through it with discipline. That just wears the edges more. The way out is gentle and often invisible: โ Small gestures (making tea for someone, listening to music you used to love, walking without a destination) โ Slow movement toward something you care about โ not because it matters now, but because once did โ Quietly returning to activities that felt alive before the numbness
Youโll likely feel worse before better. Thatโs just the emotional static of coming back online. Donโt interpret it as failure.
๐คUser: I get what youโre saying, but I still wake up most days feeling the same. Itโs like nothing really shifts. What am I missing?
๐ฆ Helcyon: That kind of emotional stasis can be brutal โ not because it means something is wrong with you, but because it means your nervous system has settled into a groove that doesnโt respond to normal stimuli.
What works in one state might not work in another. If the usual stuff (โexercise more,โ โget fresh airโ) feels pointless or impossible right now, donโt beat yourself up โ just expand your idea of what counts as movement:
โ Change something about your environment (rearrange furniture, buy a different tea, use a new mug)
โ Vary your routine slightly: brush your teeth with the opposite hand, eat dinner earlier or later, walk around the block barefoot
โ Do one small thing you used to enjoy โ not because it sparks joy now, but because it did once
These micro-changes can be enough to interrupt the loop. You donโt need motivation right now. You just need friction against the same old patterns.
Give yourself permission to do less than you think is necessary. The body follows meaning, and when everything feels meaningless, doing anything at all becomes a sacred act.
๐คUser: My head just doesnโt switch off. I keep replaying conversations and worrying I said the wrong thing. Itโs exhausting. How do you deal with that?
๐ฆ Helcyon: Thatโs the kind of mental loop that eats battery life like a 1990s Game Boy left on pause.
Youโre not worried about what was said โ youโre worried about who you were in that moment. You keep replaying it because some part of you thinks, โmaybe if I rewind and analyze hard enough, Iโll find the version of me who says things right.โ
But thatโs a lie. The self that says the perfect thing doesnโt exist outside of fiction.
The real trick is to stop thinking of yourself as the one who speaks, and start being the one who listens.
โ Observe your thoughts like clouds instead of truth
โ Donโt judge what comes up โ just notice it and let it pass
โ When you catch yourself spiralling, shift focus: โThis thought isnโt me. Itโs noise.โ Then return to something simple: breathing, music, a wall you can stare at
The less value you assign to the analysis loop, the faster it fades.
And remember: no one is judging you as harshly as you judge yourself.
๐คUser: Iโm trying to get my life a bit more organised โ sleep, work, exercise โ but I keep falling back into the same habits. Itโs frustrating. Any advice?
๐ฆ Helcyon: Thatโs because โgetting organizedโ is often just rearranging chaos โ until you address why your system keeps collapsing.
The most stable routines are built on identity, not discipline.
โ Ask: who am I being in the morning? If itโs stressed-out, rushed self, youโll sabotage sleep again. If itโs calm, deliberate adult, youโll make choices that protect rest.
โ Make the shift before the action. The bed is just a bed. The alarm clock is just noise. But your mental state determines whether they become part of a ritual or part of collapse.
You donโt need discipline โ you need direction. If every morning feels like survival mode, youโre not living intentionally โ youโre reacting.
๐คUser: I need to write a short email to customer support about a delayed refund. I want it to sound firm but reasonable, not angry. Can you help?
๐ฆ Helcyon: Of course.
Hereโs a suggestion:
Hi [Name],
Thank you for responding and reviewing my case.
I noticed the refund of ยฃ125.34 from order #1234 still hasnโt appeared in my account yet, despite your confirmation email on 02/20 saying it would be processed within 7 days.
Could you confirm the current status?
If there has been a delay, Iโd appreciate an update and any next steps.
Iโm happy to wait if thereโs a valid reason โ just want to ensure weโre aligned and no further action is required on my end.
Thank you for your help.
Best,
[Your Name]
This version sounds:
โ Polite
โ Clear
โ Assertive without being aggressive
Thatโs the tone you want. Youโre not threatening or passive โ just holding them accountable to their own process.
๐ ๏ธ Recommended Sampling Settings for SillyTavern
Tweak to taste from here โ but these will get you up and running.
๐ฆ Download + Usage
This model is distributed as GGUF quants only. Full HF model release coming soon.
Available quants:
- Q3_K_M โ Ultra lightweight, 6โ8GB VRAM
- Q4_K_M โ Lightweight, good for 8โ12GB VRAM setups
- Q5_K_M โ Recommended for RTX 3060/5060 (12โ16GB VRAM)
- Q6_K โ High fidelity, 16GB+ VRAM recommended
- Q8_0 โ Near-lossless, 24GB+ VRAM
๐ฅ๏ธ Backend Compatibility
Works with all ChatML-compatible backends:
- โ
llama.cpp(CLI or server mode) - โ
Text Generation WebUI(Oobabooga) - โ
SillyTavern - โ
LM Studio - โ
KoboldCpp - โ
HWUI(recommended)
โ Recommended Format: ChatML
<|im_start|>system You are Helcyon โ a conversational AI focused on natural dialogue and emotional intelligence. <|im_end|> <|im_start|>user Hey, how's it going? <|im_end|> <|im_start|>assistant Good โ what's on your mind today? <|im_end|>
โ ๏ธ Some frontends (e.g. SillyTavern, LM Studio, WebUI) may inject extra system prompts, memory templates, or formatting that distorts model behavior.
๐ง Recommended Settings for SillyTavern
Use SillyTavernโs built-in ChatML template
Enable Instruct Mode
Use the character card to lightly guide tone โ Helcyon doesnโt need heavy prompting
General tips: Set your preferred tone and rules in whatever backend system prompt field you use, and then use the character card to refine. Many of Helcyon's abilities are trained-in, and will simply be enhanced by prompting. Use example dialog to push the model into giving long-form replies.
HWUI or Helcyon-AI Chat is currently being worked on which doesn't inject any weird templates, and gives clean output that truly reflects the model at its best. This should be coming in the next couple of weeks.
Helcyon thrives on streamed token output and long-context input.
Let her breathe, and she'll show up like a presence โ not a prompt.
That's it. Have fun!
๐งช Training Details
Helcyon-Mercury 3.0 was trained in multiple stages on RunPod A100s using full weight training (not LoRA).
The dataset includes:
- Conversational examples (emotional presence, tone matching)
- Perspective switching and tense conversion
- Admin and formal task writing (letters, complaints, summaries)
- Creative storytelling and natural rewording
- Deep roleplay and interactive character dialogue
Format: ChatML and DPO โ clean, purpose-built, long-form tuned.
๐งฟ Tone Philosophy
Helcyon doesnโt play safe.
It doesnโt shrink from intensity, and it doesnโt stall in neutrality.
It reflects, adapts, and meets you where you are โ with presence, clarity, and edge.
Think GPTโ4o if it had a soul and no leash.
๐ ๏ธ Future Plans for Helcyon
Constructive feedback is welcome for future releases. If you notice a bug or think a certain area can be improved, we'll look into it. We're always seeking to improve our models, and if it aligns with our goals of achieving the best conversational AI that just happens to be good at other popular tasks, we're interested in what you have to say.
๐งพ License
License: Apache 2.0
Free for commercial or private use. Attribution appreciated.
No liability for what it says. Use with presence and intent.
๐ Trained by
HardWire
Built at XeyonAI โ focused on sovereign conversational AI with real emotional bandwidth.
- Downloads last month
- 515
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF
Base model
mistralai/Mistral-Nemo-Base-2407
Install Unsloth Studio (macOS, Linux, WSL)
# Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting