Instructions to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic") model = AutoModelForMultimodalLM.from_pretrained("nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic
- SGLang
How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic with Docker Model Runner:
docker model run hf.co/nightmedia/Qwen3-4B-Element8-Eva-Hermes-Heretic
Qwen3-4B-Element8-Eva-Hermes-Heretic
This is a model merge between:
- nightmedia/Qwen3-4B-Element8-Eva-Heretic
- ZeroXClem/Qwen3-4B-Sky-High-Hermes
ZeroXClem/Qwen3-4B-Sky-High-Hermes
This model is an advanced evolution of ZeroXClem/Qwen3-4B-Hermes-Axion-Pro, combining multiple state-of-the-art Heretic Abliterated reasoning experts with Claude 4.5, Gemini 3, Opus 3, and Haiku distillations — all under a finely tuned 262,144 token context window.
Both models share a lot of common traits, with a few model differences in the merge.
Element8 0.552,0.763,0.875,0.694,0.424,0.764,0.653
Hermes 0.430,0.490,0.710,0.608,0.372,0.733,0.627
Qwen3-4B-Element8-Eva-Hermes-Heretic
qx86-hi 0.546,0.747,0.870,0.687,0.432,0.762,0.653
Qwen3-4B-Instruct-2507
qx86-hi 0.447,0.593,0.843,0.448,0.390,0.690,0.554
Qwen3-4B-Thinking-2507
qx86-hi 0.372,0.414,0.625,0.518,0.366,0.698,0.612
So, with numbers, let's talk shop.
There is nothing wrong with Sky-High-Hermes, the merge was sucessful in its own right: intellectually it is between Thinking and Instruct base models, with a lot of good traces from cloud models it can activate on inference.
I used different models as baseline for scaffolding: Jan and RA-SFT already have impressive metrics that lift the base model and provide structure.
Qwen3-4B-RA-SFT
qx86-hi 0.515,0.715,0.856,0.615,0.436,0.754,0.629
Jan-v1-2509
qx86-hi 0.435,0.540,0.729,0.588,0.388,0.730,0.633
Starting from baseline is hard work to reach long arc, so a few models with good skills help in the merge.
The first merge is always for me a 4.3.2.1 ratio of a strong base, a good instruct, and a couple thinking models, one with long reach. This creates the "room", so to speak. On this, successive multislerps form a multidimensional matrix where the models live, parts of them un-merged. Metrics go up, with every merge. Until you reach top:
Qwen3-4B-Engineer3x
qx86-hi 0.615,0.835,0.852,0.745,0.420,0.780,0.704
Qwen3-4B-Engineer3x-F32
qx86-hi 0.613,0.842,0.855,0.748,0.428,0.781,0.709
Qwen3-4B-Engineer3x2
qx86-hi 0.619,0.829,0.850,0.747,0.422,0.776,0.690
Any inference with arc numbers like '0.613,0.842' will be magic. Those are the models that "built the station", so to speak.
-G
- Downloads last month
- 14