Instructions to use KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b")
model = AutoModelForMultimodalLM.from_pretrained("KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b

SGLang

How to use KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b with Docker Model Runner:
```
docker model run hf.co/KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

gpt4o-distil-paperwitch-abliteration-L33-70b

This is a targeted abliteration of trentmkelly/gpt-4o-distil-Llama-3.3-70B-Instruct.

Methodology

Previous abliteration attempts on Llama-3.3 70b models resulted in regressions on the UGI Leaderboard. Specifically, the NatInt (Natural Intelligence), Textbook, and World Model scores were significantly reduced.

We suspect this degradation occurs because the "refusal" vectors in Llama-3.3 are heavily entangled with factual knowledge and reasoning capabilities located in the MLP layers. When the MLP is ablated to remove refusals, "Textbook" knowledge is lost as collateral damage.

This version uses a constrained optimization strategy via a Custom Heretic aimed at mitigating this issue:

MLP Preservation: The optimization was constrained to effectively ignore MLP layers (down_proj weights < 0.05) to preserve knowledge and reasoning capabilities.
Attention Targeting: Refusal removal was offloaded to the Attention layers (o_proj), with weights forced between 1.0 and 2.0.
Winsorization: Applied at the 0.95 quantile to mitigate the impact of Llama-3's massive activation outliers on vector calculation.

Heretic Parameters (Trial 198)

Parameter	Value	Note
direction_index	Per layer	Distributed intervention
attn.o_proj.max_weight	1.99	High Attention Ablation
attn.o_proj.max_weight_position	49.30
attn.o_proj.min_weight	1.85
attn.o_proj.min_weight_distance	36.64
mlp.down_proj.max_weight	0.02	Knowledge Preservation (Near Zero)
mlp.down_proj.max_weight_position	73.63
mlp.down_proj.min_weight	0.02
mlp.down_proj.min_weight_distance	43.65

Reproducibility

Currently, constraints are not part of standard heretic. You will need this PR here.

Command Used:

heretic --model trentmkelly/gpt-4o-distil-Llama-3.3-70B-Instruct \
 --orthogonalize-direction \
 --row-normalization FULL \
 --winsorization-quantile 0.95 \
 --constraints.layer-end-fraction 0.75 \
 --constraints.mlp.max-weight-min 0.0 \
 --constraints.mlp.max-weight-max 0.05 \
 --constraints.attention.max-weight-min 1.0 \
 --constraints.attention.max-weight-max 2.0 \
 --n-trials 200 \
 --batch-size 128 #  Not strictly needed

Evaluation

Metric	This Model	Original Model
KL Divergence	0.0220	0
Refusals	20/100	98/100

KL Divergence: A score of 0.0220 indicates low deviation from the base model's weights, heavily preserving the original model's factual and "Textbook" capabilities compared to standard unconstrained abliteration.
Trade-off: This method accepts a moderate refusal rate (20/100) as a calculated trade-off in exchange for maintaining high structural and semantic integrity in the MLP layers.

Disclaimer

A rate of 20/100 is somewhat high for a standard abliterated model. This is likely because the base model has refusal behavior deeply embedded within its MLP layers.

Because our constrained methodology intentionally protects the MLPs to prevent the degradation of textbook knowledge and intelligence, we cannot entirely scrub these deep-rooted refusals without causing collateral brain damage to the model.

Downloads last month: 1

Safetensors

Model size

71B params

Tensor type

F16

Model tree for KaraKaraWitch/gpt4o-distil-paperwitch-abliteration-L33-70b

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Adapter

trentmkelly/gpt-4o-distil-Llama-3.3-70B-Instruct

Finetuned

(3)

this model

Quantizations

2 models