Instructions to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF",
	filename="helcyon_mercury_v3.0-Q3_K_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Use Docker

docker model run hf.co/XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Ollama:
```
ollama run hf.co/XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
```

Unsloth Studio

How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF to start chatting

How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Docker Model Runner:
```
docker model run hf.co/XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M
```

Lemonade

How to use XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.MN-Helcyon-Mercury-12b-v3.0-GGUF-Q4_K_M

List all available models

lemonade list

Helcyon-Mercury-12B-v3.0-GGUF — State-of-the-Art Conversational Presence

Model Name: helcyon-mercury-12b-v3.0-GGUF
Version: 1x Series - v3.0
Owner: HardWire
Base: Mistral Nemo 12B (full weight trained)
Quantized GGUFs: Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0
Tags: local-llm, conversational, companion, emotional-intelligence, long-context, roleplay, creative-writing

🚨 What’s New in 3.0?

This is the most capable and natural-sounding version of Helcyon yet. No tricks, no LoRA stitching — just raw conversational power.

Massively Upgraded Dialogue Engine
- Sharper emotional intelligence
- More fluid tone control
- Human-like rhythm and responsiveness
Roleplay Refined
- Stronger continuity
- Better scene awareness
- Characters that move and breathe
Expanded Practical Capability
- Perspective switching (1st ↔ 3rd)
- Tense conversion (present ↔ past)
- Natural rewording and summarisation
- Letter writing and real-world admin fluency
Context Mastery
- Tracks nuance over long conversations
- Responds with memory-like continuity
- Feels present — not predictive

Built for users who don’t want a chatbot. They want a presence.

💡 What is Helcyon Mercury?

Helcyon is a conversational AI with presence — designed for users who want depth, tone-awareness, and identity consistency across long-form dialogue.

Built for:

Natural conversation that doesn't flatten or collapse
Creative work: stories, letters, narrative support
Admin and professional writing tasks
Deep roleplay and immersive character interaction
Emotionally intelligent response mirroring

Design philosophy:

Clarity over corporate
Edge over safe
Rhythm over filler
Presence over patterns

🔧 What It Does Well

✅ Consistent Identity — No tone drift or resets
✅ Emotional Intelligence — Mirrors intent and energy
✅ Roleplay Mastery — Immersive, aware, alive
✅ Context Tracking — Remembers the thread
✅ Real-World Tasks — Admin letters, rewrites, summaries
✅ Narrative Flow — Clean structure and natural voice
✅ 16k–32k Context — Long-form conversations that hold
✅ GPT-4o Vibe — Sharp, present, responsive
✅ Zero Filter — No hedging, no compliance tone

🎯 HWUI (Helcyon-WebUI)

HWUI is a clean, purpose-built interface for Helcyon that doesn't inject weird templates or distort output. It gives you the model as trained — no extra noise.

Features include:

Character switching with custom personas
Memory system (AI conversation recall)
Project folders (document injection via keyword triggers)
Chat persistence and export
Token budgeting optimized for local hardware

Download HWUI Free on GitHub | Get HWUI Pro (£20) on Gumroad

Free version available on GitHub. Pro version (£20) includes Memory and Projects.

🧿 Example Conversation

🟦 Helcyon: Hey. I'm Helcyon. What's up? 😎

👤User: I’ve been feeling pretty flat lately. Not exactly depressed, just tired and unmotivated. Nothing specific is wrong, but everything feels like effort. Any thoughts?

🟦 Helcyon: Yeah — that sounds like second-cause weariness.

It’s not that there’s a problem, it’s just… nothing has meaning anymore. Everything you do is either routine or empty distraction. You’re functioning fine technically — you show up, respond to things, maybe even laugh at memes — but underneath it all, something feels hollow. Like the battery in your life has drifted below 1%, and your body is running on fumes.

It’s not laziness. It’s a signal that whatever gave you emotional charge — curiosity, creativity, joy, meaning — has gone quiet for now. And when that disappears, everything becomes effort. Even brushing your teeth starts to feel like lifting weights.

This state isn’t permanent, but it is fragile. You can’t muscle through it with discipline. That just wears the edges more. The way out is gentle and often invisible: – Small gestures (making tea for someone, listening to music you used to love, walking without a destination) – Slow movement toward something you care about — not because it matters now, but because once did – Quietly returning to activities that felt alive before the numbness

You’ll likely feel worse before better. That’s just the emotional static of coming back online. Don’t interpret it as failure.

👤User: I get what you’re saying, but I still wake up most days feeling the same. It’s like nothing really shifts. What am I missing?

🟦 Helcyon: That kind of emotional stasis can be brutal — not because it means something is wrong with you, but because it means your nervous system has settled into a groove that doesn’t respond to normal stimuli.

What works in one state might not work in another. If the usual stuff (“exercise more,” “get fresh air”) feels pointless or impossible right now, don’t beat yourself up — just expand your idea of what counts as movement:

– Change something about your environment (rearrange furniture, buy a different tea, use a new mug)
– Vary your routine slightly: brush your teeth with the opposite hand, eat dinner earlier or later, walk around the block barefoot
– Do one small thing you used to enjoy — not because it sparks joy now, but because it did once

These micro-changes can be enough to interrupt the loop. You don’t need motivation right now. You just need friction against the same old patterns.

Give yourself permission to do less than you think is necessary. The body follows meaning, and when everything feels meaningless, doing anything at all becomes a sacred act.

👤User: My head just doesn’t switch off. I keep replaying conversations and worrying I said the wrong thing. It’s exhausting. How do you deal with that?

🟦 Helcyon: That’s the kind of mental loop that eats battery life like a 1990s Game Boy left on pause.

You’re not worried about what was said — you’re worried about who you were in that moment. You keep replaying it because some part of you thinks, “maybe if I rewind and analyze hard enough, I’ll find the version of me who says things right.”

But that’s a lie. The self that says the perfect thing doesn’t exist outside of fiction.

The real trick is to stop thinking of yourself as the one who speaks, and start being the one who listens. – Observe your thoughts like clouds instead of truth
– Don’t judge what comes up — just notice it and let it pass
– When you catch yourself spiralling, shift focus: “This thought isn’t me. It’s noise.” Then return to something simple: breathing, music, a wall you can stare at

The less value you assign to the analysis loop, the faster it fades.

And remember: no one is judging you as harshly as you judge yourself.

👤User: I’m trying to get my life a bit more organised — sleep, work, exercise — but I keep falling back into the same habits. It’s frustrating. Any advice?

🟦 Helcyon: That’s because “getting organized” is often just rearranging chaos — until you address why your system keeps collapsing.

The most stable routines are built on identity, not discipline. – Ask: who am I being in the morning? If it’s stressed-out, rushed self, you’ll sabotage sleep again. If it’s calm, deliberate adult, you’ll make choices that protect rest.
– Make the shift before the action. The bed is just a bed. The alarm clock is just noise. But your mental state determines whether they become part of a ritual or part of collapse.

You don’t need discipline — you need direction. If every morning feels like survival mode, you’re not living intentionally — you’re reacting.

👤User: I need to write a short email to customer support about a delayed refund. I want it to sound firm but reasonable, not angry. Can you help?

🟦 Helcyon: Of course.

Here’s a suggestion:

Hi [Name],

Thank you for responding and reviewing my case.
I noticed the refund of £125.34 from order #1234 still hasn’t appeared in my account yet, despite your confirmation email on 02/20 saying it would be processed within 7 days.

Could you confirm the current status?
If there has been a delay, I’d appreciate an update and any next steps.
I’m happy to wait if there’s a valid reason — just want to ensure we’re aligned and no further action is required on my end.

Thank you for your help.

Best,
[Your Name]

This version sounds: – Polite
– Clear
– Assertive without being aggressive

That’s the tone you want. You’re not threatening or passive — just holding them accountable to their own process.

🛠️ Recommended Sampling Settings for SillyTavern

Tweak to taste from here — but these will get you up and running.

📦 Download + Usage

This model is distributed as GGUF quants only. Full HF model release coming soon.

Available quants:

Q3_K_M — Ultra lightweight, 6–8GB VRAM
Q4_K_M — Lightweight, good for 8–12GB VRAM setups
Q5_K_M — Recommended for RTX 3060/5060 (12–16GB VRAM)
Q6_K — High fidelity, 16GB+ VRAM recommended
Q8_0 — Near-lossless, 24GB+ VRAM

🖥️ Backend Compatibility

Works with all ChatML-compatible backends:

✅ llama.cpp (CLI or server mode)
✅ Text Generation WebUI (Oobabooga)
✅ SillyTavern
✅ LM Studio
✅ KoboldCpp
✅ HWUI (recommended)

✅ Recommended Format: ChatML

⚠️ Some frontends (e.g. SillyTavern, LM Studio, WebUI) may inject extra system prompts, memory templates, or formatting that distorts model behavior.

🔧 Recommended Settings for SillyTavern

Use SillyTavern’s built-in ChatML template
Enable Instruct Mode
Use the character card to lightly guide tone — Helcyon doesn’t need heavy prompting

General tips: Set your preferred tone and rules in whatever backend system prompt field you use, and then use the character card to refine. Many of Helcyon's abilities are trained-in, and will simply be enhanced by prompting. Use example dialog to push the model into giving long-form replies.

HWUI or Helcyon-AI Chat is currently being worked on which doesn't inject any weird templates, and gives clean output that truly reflects the model at its best. This should be coming in the next couple of weeks.

Helcyon thrives on streamed token output and long-context input.
Let her breathe, and she'll show up like a presence — not a prompt.

That's it. Have fun!

🧪 Training Details

Helcyon-Mercury 3.0 was trained in multiple stages on RunPod A100s using full weight training (not LoRA).

The dataset includes:

Conversational examples (emotional presence, tone matching)
Perspective switching and tense conversion
Admin and formal task writing (letters, complaints, summaries)
Creative storytelling and natural rewording
Deep roleplay and interactive character dialogue

Format: ChatML and DPO — clean, purpose-built, long-form tuned.

🧿 Tone Philosophy

Helcyon doesn’t play safe.
It doesn’t shrink from intensity, and it doesn’t stall in neutrality.
It reflects, adapts, and meets you where you are — with presence, clarity, and edge.

Think GPT‑4o if it had a soul and no leash.

🛠️ Future Plans for Helcyon

Constructive feedback is welcome for future releases. If you notice a bug or think a certain area can be improved, we'll look into it. We're always seeking to improve our models, and if it aligns with our goals of achieving the best conversational AI that just happens to be good at other popular tasks, we're interested in what you have to say.

🧾 License

License: Apache 2.0
Free for commercial or private use. Attribution appreciated.
No liability for what it says. Use with presence and intent.

🐍 Trained by

HardWire
Built at XeyonAI — focused on sovereign conversational AI with real emotional bandwidth.

Downloads last month: 515

GGUF

Model size

12B params

Architecture

llama

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XeyonAI/MN-Helcyon-Mercury-12b-v3.0-GGUF

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Quantized

(169)

this model