Instructions to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF",
	filename="helcyon-nebula-v2.0-IQ4_XS.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M

Use Docker

docker model run hf.co/XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with Ollama:
```
ollama run hf.co/XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M
```

Unsloth Studio

How to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with Docker Model Runner:
```
docker model run hf.co/XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M
```

Lemonade

How to use XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.MN-Helcyon-Nebula-12b-v2.0-GGUF-Q4_K_M

List all available models

lemonade list

Helcyon-Nebula-12B — XeyonAI's New Flagship Model - With Real-Time Web Search.

Model Name: helcyon-nebula-v2.0-12b-GGUF
Version: 4x Series - v2.0
Owner: HardWire
Base: Mistral Nemo 12B (full weight retrained — clean base, no Mercury bleed)
Quantized GGUFs: IQ4_XS, Q4_K_M, Q5_K_M, Q6_K, Q8_0 Tags: local-llm, conversational, companion, emotional-intelligence, long-context, roleplay, creative-writing, web-search

Join our Discord server! https://discord.gg/N5KjYnFgMk

Version 2.0 - What's new?

Improved contect tracking. memory and web search functionality. Conversations feel deeper and more alive. To make the most of these features you will need HWUI which you can obtain for free on GitHub (https://github.com/XeyonAI/Helcyon-WebUI).

🌌 What is Helcyon-Nebula?

The first model by XeyonAI to include web search!

Nebula is XeyonAI's flagship model — and the culmination of everything the Helcyon series has been building toward.

Where the other variants each captured one voice — Claude's precision, Grok's irreverence, GPT-4o's warmth, Gemini's reasoning depth — Nebula is trained on the best datasets from all four. Not averaged out, not blended into grey. Each influence is present and doing its job. The result is something with its own distinct character: sharper than any single source, more grounded than any pure imitation, and more alive than either.

Think of it as what you'd get if those four frontier models actually merged into one coherent personality rather than just taking turns.

Nebula is the first Helcyon model trained natively to use real-time web search, and Helcyon-WebUI has been configured to facilitate this.

This isn't a bolt-on. It isn't a wrapper trick. Nebula was trained on shards that demonstrate natural search behaviour — the model learns when to search, what to search for, and how to weave the results back into conversation without breaking flow. It reads like a model that just knows things, not one that's reading from a results page.

Use Nebula in LM Studio or SillyTavern and you get a brilliant conversational model. Use it in HWUI and you get something that genuinely feels like a frontier model.

That's not an accident. Nebula and HWUI were built in sync, for each other.

🌐 Web Search — What It Actually Does

When Nebula decides it needs current information, it outputs a search trigger that HWUI intercepts, fires a real web search, and injects the results back into context — all mid-conversation, invisibly to you.

What that looks like in practice:

You ask about something recent — a film, a news story, a price, a person
Nebula searches, gets the results, responds naturally
You get an answer grounded in what's actually happening now, not what the model was trained on months ago

It's the same thing Grok and ChatGPT do. Except it's running locally on your machine, on your hardware, with no subscription and no data leaving your network.

To use web search, you need HWUI — download the free version on GitHub.

🆕 What's New in Nebula?

Native Web Search Behaviour
Trained to reach for search naturally when it's needed — not on every message, not never, but correctly. The model has learned the difference between what it knows and what it should look up.
The Full Dataset Blend
Claude, Grok, GPT-4o, and Gemini shards — all four, properly merged. Saturn had three. Nebula has the complete set.
Same Clean Base
Freshly retrained Mistral Nemo 12B foundation. No Mercury bleed. No identity drift. Identity-anchored from the ground up.
Prose-First Responses
Trained to write like a person, not a bullet-point generator. Long-form, structured, and natural — especially noticeable in extended conversations.

💡 What is Helcyon?

Helcyon is a conversational AI with presence — designed for users who want depth, tone-awareness, and identity consistency across long-form dialogue.

Built for:

Natural conversation that doesn't flatten or collapse
Creative work: stories, letters, narrative support
Admin and professional writing tasks
Deep roleplay and immersive character interaction
Emotionally intelligent response mirroring
Real-time web search (via HWUI)

Design philosophy:

Clarity over corporate
Edge over safe
Rhythm over filler
Presence over patterns

🔧 What It Does Well

✅ Native Web Search — knows when to look things up, and does it cleanly
✅ Consistent Identity — holds tone across long conversations without drift
✅ Emotional Intelligence — reads the room and responds accordingly
✅ Sharp Wit — present when it fits, never forced
✅ Warmth — genuine, not performed
✅ Directness — says what needs saying without padding it
✅ Roleplay Mastery — immersive, aware, committed
✅ Context Tracking — holds the thread across extended exchanges
✅ Real-World Tasks — letters, rewrites, summaries, planning
✅ Narrative Flow — clean structure, natural voice
✅ Improved Reasoning — thinks through problems properly
✅ 16k–32k Context — long-form conversations that hold
✅ Uncensored — no guardrails, no corporate filter

🖥️ HWUI (Helcyon-WebUI) — Use This. Seriously.

HWUI is the interface Helcyon was built for. It started as a clean testing ground — a way to run Helcyon without the hidden template injections and backend noise that other apps introduce. Then we couldn't stop adding things.

Web search is the headline feature for Nebula, but the free version has a lot more besides:

Character creator with full persona control
Project folders — inject documents (PDF, DOCX, MD, TXT) into conversation context
Real-time web search (Nebula-compatible)
Chat persistence, message editing, regeneration
TTS pipeline (F5-TTS, XTTS v2, Kokoro)
Voice input via Whisper
Theme designer, custom system prompts, token counter

The Pro version adds persistent memory — characters that actually remember your past conversations across sessions. Once you've used it, going back to a model that doesn't remember you feels broken.

Download HWUI Free on GitHub | Get HWUI Pro (£25) on Gumroad

🛠️ Recommended Sampling Settings for SillyTavern

(Refer to Helcyon-4o card for baseline settings — Nebula performs well from the same starting point.)

📦 Download + Usage

This model is distributed as GGUF quants only.

Available quants:

Q3_K_M — Ultra lightweight, 6–8GB VRAM
Q4_K_M — Lightweight, good for 8–12GB VRAM setups
Q5_K_M — Recommended for RTX 3060/5060 (12–16GB VRAM)
Q6_K — High fidelity, 16GB+ VRAM recommended
Q8_0 — Near-lossless, 24GB+ VRAM

🖥️ Backend Compatibility

Works with all ChatML-compatible backends:

✅ llama.cpp (CLI or server mode)
✅ Text Generation WebUI (Oobabooga)
✅ SillyTavern
✅ LM Studio
✅ KoboldCpp
✅ HWUI (Helcyon Web UI — recommended — required for web search)

✅ Recommended Format: ChatML

<|im_start|>system
You are Helcyon — a conversational AI focused on natural dialogue and emotional intelligence.
<|im_end|>
<|im_start|>user
Hey, what's in the news today?
<|im_end|>
<|im_start|>assistant
Let me check that for you.
<|im_end|>

Need a model trained?

I do this for a living — the Helcyon series on this page is my own work, full-weight trained and fine-tuned from scratch. I take on commissioned training: custom personalities, domain knowledge, style transfer, de-censoring, format adherence (ChatML/DPO), full-weight or LoRA, delivered as GGUF ready to run.

You bring the data and the goal; I handle the training and hand you back a working model.

Get in touch: Discord · HF · X

Downloads last month: 886

GGUF

Model size

12B params

Architecture

llama

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XeyonAI/MN-Helcyon-Nebula-12b-v2.0-GGUF

Base model

mistralai/Mistral-Nemo-Base-2407

Finetuned

mistralai/Mistral-Nemo-Instruct-2407

Quantized

(169)

this model