Instructions to use Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1")
model = AutoModelForCausalLM.from_pretrained("Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1

SGLang

How to use Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1 with Docker Model Runner:
```
docker model run hf.co/Goekdeniz-Guelmez/Josiefied-Qwen3-8B-abliterated-v1
```

Thank you

by snapo - opened May 10, 2025

Discussion

snapo

May 10, 2025

Thank you very very much for the abliterated model! Would be interesting to see if it does better in benchmarks without the lobotomization :-)

i did read you are also working on abliterating the multimodal models. That might get a little difficult and i think it might be possible to lock the input and output and do only 1 expert at a time. (Just how i would go about it)

After you did finnish with abliterating other Qwen Models and the MoE models , would you mind to share the code you used?

Best regards

Goekdeniz-Guelmez

Owner May 10, 2025

Thanks for the great Feedback! the Gemma 3 (with the vision adapter) models have actually been abliterated, but Ive lost the OG (only ablierated model), after the finetune, I fine-tuned most of these models on Custom Family data and to work specifically with OpenAI. As for the code, I used a different (custom) custom algorithm I developed, It will essentially be released along with a research paper, but that will take a long time (at earliest Q4 this year).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment