Instructions to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic")
model = AutoModelForMultimodalLM.from_pretrained("nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic

SGLang

How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with Docker Model Runner:
```
docker model run hf.co/nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3-4B-Element8-Eva-Hermes-Heretic

This is a model merge between:

nightmedia/Qwen3-4B-Element8-Eva-Heretic
ZeroXClem/Qwen3-4B-Sky-High-Hermes

ZeroXClem/Qwen3-4B-Sky-High-Hermes

This model is an advanced evolution of ZeroXClem/Qwen3-4B-Hermes-Axion-Pro, combining multiple state-of-the-art Heretic Abliterated reasoning experts with Claude 4.5, Gemini 3, Opus 3, and Haiku distillations — all under a finely tuned 262,144 token context window.

Both models share a lot of common traits, with a few model differences in the merge.

Element8  0.552,0.763,0.875,0.694,0.424,0.764,0.653
Hermes    0.430,0.490,0.710,0.608,0.372,0.733,0.627

Qwen3-4B-Element8-Eva-Hermes-Heretic
qx86-hi   0.546,0.747,0.870,0.687,0.432,0.762,0.653

Qwen3-4B-Instruct-2507
qx86-hi   0.447,0.593,0.843,0.448,0.390,0.690,0.554
Qwen3-4B-Thinking-2507
qx86-hi   0.372,0.414,0.625,0.518,0.366,0.698,0.612

So, with numbers, let's talk shop.

There is nothing wrong with Sky-High-Hermes, the merge was sucessful in its own right: intellectually it is between Thinking and Instruct base models, with a lot of good traces from cloud models it can activate on inference.

I used different models as baseline for scaffolding: Jan and RA-SFT already have impressive metrics that lift the base model and provide structure.

Qwen3-4B-RA-SFT
qx86-hi   0.515,0.715,0.856,0.615,0.436,0.754,0.629

Jan-v1-2509
qx86-hi   0.435,0.540,0.729,0.588,0.388,0.730,0.633

Starting from baseline is hard work to reach long arc, so a few models with good skills help in the merge.

The first merge is always for me a 4.3.2.1 ratio of a strong base, a good instruct, and a couple thinking models, one with long reach. This creates the "room", so to speak. On this, successive multislerps form a multidimensional matrix where the models live, parts of them un-merged. Metrics go up, with every merge. Until you reach top:

Qwen3-4B-Engineer3x
qx86-hi   0.615,0.835,0.852,0.745,0.420,0.780,0.704
Qwen3-4B-Engineer3x-F32
qx86-hi   0.613,0.842,0.855,0.748,0.428,0.781,0.709
Qwen3-4B-Engineer3x2
qx86-hi   0.619,0.829,0.850,0.747,0.422,0.776,0.690

Any inference with arc numbers like '0.613,0.842' will be magic. Those are the models that "built the station", so to speak.

-G