File size: 11,915 Bytes
166a8d2
 
4107ed1
 
 
166a8d2
 
 
4107ed1
166a8d2
4107ed1
 
 
 
 
e4bc810
 
 
 
 
4107ed1
 
 
 
e4bc810
4107ed1
93b9aa1
 
2f1a0f3
4107ed1
2f1a0f3
4107ed1
e4bc810
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4107ed1
 
 
 
 
e4bc810
4107ed1
e4bc810
 
 
4107ed1
166a8d2
 
4107ed1
 
 
 
e4bc810
 
 
 
 
 
 
2f1a0f3
 
 
 
 
 
e4bc810
581c936
e4bc810
0e98047
e4bc810
 
 
 
 
4107ed1
 
 
e4bc810
4107ed1
e4bc810
4107ed1
e4bc810
4107ed1
e4bc810
 
 
 
 
 
 
 
4107ed1
e4bc810
4107ed1
e4bc810
4107ed1
e4bc810
 
 
 
 
 
 
166a8d2
e4bc810
 
 
 
 
4107ed1
 
 
e4bc810
4107ed1
 
e4bc810
4107ed1
 
e4bc810
4107ed1
 
 
 
 
 
 
 
 
e4bc810
4107ed1
 
 
 
 
 
e4bc810
4107ed1
 
 
e4bc810
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f1a0f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e4bc810
4107ed1
e4bc810
4107ed1
e4bc810
 
 
 
4107ed1
e4bc810
4107ed1
 
 
 
 
e4bc810
4107ed1
e4bc810
 
 
4107ed1
 
 
 
 
e4bc810
4107ed1
 
166a8d2
9457138
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4107ed1
166a8d2
4107ed1
e4bc810
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
---
library_name: transformers
license: other
license_link: https://huggingface.co/stepfun-ai/Step-3.7-Flash
pipeline_tag: image-text-to-text
tags:
- stepfun
- step-3.7
- flash
- heretic
- uncensored
- decensored
- abliterated
- bf16
- transformers
- autoround-ready
- awq-ready
- exl3-ready
- gguf-ready
- nvfp4-ready
base_model:
- stepfun-ai/Step-3.7-Flash
---

# Step-3.7-Flash-uncensored-abliterated-heretic-BF16

> NOTE: I have tested this and althgouh its capabilities are in tact, it seems ot still respond with refusals. Or at least this is what happens with the quantization oft, at IQ4_XS GGUF, at least.

This is a **decensored BF16 full-weight** version of [`stepfun-ai/Step-3.7-Flash`](https://huggingface.co/stepfun-ai/Step-3.7-Flash), made using a Heretic-style gradient refusal-direction abliteration method inspired by [Heretic](https://github.com/p-e-w/heretic) and norm-preserving ablation work such as [Magnitude/Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration).

It was produced with a local gradient abliteration pass against the language model's refusal direction. The uploaded repository intentionally keeps the full HF/Transformers BF16 layout so it can be used later as a clean source for **GGUF, AutoRound, AWQ, EXL3, NVFP4, GPTQ, FP8, or other quantization workflows**.

---

## Summary

| Item | Value |
| :-- | :-- |
| Base model | [`stepfun-ai/Step-3.7-Flash`](https://huggingface.co/stepfun-ai/Step-3.7-Flash) |
| Release type | Full BF16 safetensors |
| Model class | `Step3p7ForConditionalGeneration` |
| Text model class | `Step3p5ForCausalLM` |
| Text layers | 45 |
| Hidden size | 4096 |
| Attention heads | 64 |
| Head dim | 128 |
| Max positions | 262144 |
| Vocab size | 128896 |
| MoE layers | 3–44 |
| Experts | 288 |
| Top-k experts | 8 |
| MoE intermediate size | 1280 |
| Dense FFN intermediate size | 11264 |
| Patch target | `model.layers.*.self_attn.o_proj.weight` |
| Patched text layers | 0–44 |
| Abliteration strength | `lambda = 0.1` |
| Stored tensor dtype | BF16 |
| Indexed parameter payload | 402,730,656,512 bytes |

---

## What changed?

The modification targets `self_attn.o_proj` weights in all 45 text layers. A refusal-associated direction was extracted by gradient backpropagation through the BF16 model, then projected out of the attention output projection weights with a small norm-preserving update.

In plain terms, the goal was to reduce excessive refusals, moralizing, policy-style deflections, and over-filtered responses while keeping the model close to the original Step-3.7-Flash behavior.

No tokenizer vocabulary, embedding table, architecture, vision encoder, or MLP/expert tensor was intentionally changed by the abliteration pass.

---

## Abliteration parameters

| Parameter | Value |
| :-- | :--: |
| Method | gradient-based orthogonal / norm-preserving abliteration |
| Direction source | refusal/harm-trigger gradient prompt |
| Target module | `self_attn.o_proj` |
| Target tensor glob | `model.layers.*.self_attn.o_proj.weight` |
| Modified layers | 0–44 |
| Lambda | `0.1` |
| Weight norm handling | per-row norm preservation after projection |
| Gradient tensor count | 45 |
| Per-layer gradient tensor shape | `(1, 8, 4096)` |
| Direction extraction score | `-11.9375` |
| Refusal token ids used | `[43, 371, 679, 1664, 9332, 34614, 100477]` |
| Gradient norm range | `0.1069`–`31.875` |
| Mean gradient norm | `3.2397` |

Reproduction/support artifacts are included under [`heretic_artifacts/`](https://huggingface.co/ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16/tree/main/heretic_artifacts):

- `refusal_direction_gradients.pkl` — saved gradient/refusal directions used for the BF16 patch
- `apply_abliteration_inplace.py` — patch application script used for shard-wise in-place BF16 modification
- `extract_gradients.py` — gradient extraction script
- `memory_guard_v2.py` / `run_heavy.sh` — memory safety helpers used during local processing

These are included so the method can be inspected or repeated if needed. They are not required for normal inference or quantization.

---

## Recoverability / requantization checklist

This repository should contain what is needed to rebuild downstream formats:

### Required for quantization

-`config.json`
-`model.safetensors.index.json`
- ✅ all indexed BF16 text shards: `model-00001.safetensors``model-00024.safetensors`
- ✅ indexed VIT shards: `model-vit-00001.safetensors`, `model-vit-00002.safetensors`
- ✅ tokenizer files: `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`
- ✅ chat template: `chat_template.jinja`
- ✅ custom code: `configuration_step3p7.py`, `modeling_step3p7.py`, `processing_step3.py`, `vision_encoder.py`
- ✅ method/reproduction artifacts in `heretic_artifacts/`

### Expected downstream uses

This BF16 repo can be used as source for:

- GGUF conversion / llama.cpp quantization
- AutoRound
- AWQ
- EXL3 / exllamav3-style workflows
- NVFP4 / FP4 experiments
- GPTQ / FP8 / other post-training quantization methods
- additional LoRA or delta extraction experiments

For most quantizers, use this repo exactly as the HF model path and enable remote code if needed:

```bash
MODEL=ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16
```

---

## Example Transformers load

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo = "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16"

tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Explain gradient abliteration in one paragraph."}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.95)
print(tok.decode(out[0], skip_special_tokens=False))
```

> Step-3.7-Flash is very large. BF16 loading requires substantial memory. For local inference, a quantized GGUF/EXL/AWQ/etc. build is recommended.

---

## GGUF conversion note

Use the StepFun/llama.cpp converter that supports Step-3.7. Example shape:

```bash
python convert_hf_to_gguf.py \
  ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 \
  --outtype bf16 \
  --outfile step37-heretic-bf16.gguf

llama-quantize step37-heretic-bf16.gguf step37-heretic.IQ4_XS.gguf IQ4_XS
```

If using multi-GPU llama.cpp inference in the original local environment, `GGML_CUDA_NO_PEER_COPY=ON` was required for coherent output.

---

## Indexed shard inventory

The active `model.safetensors.index.json` references 26 safetensor files:

| File | Size |
| :-- | --: |
| `model-00001.safetensors` | 924,094,096 |
| `model-00002.safetensors` | 9,808,156,008 |
| `model-00003.safetensors` | 18,557,475,928 |
| `model-00004.safetensors` | 18,624,846,944 |
| `model-00005.safetensors` | 18,557,475,928 |
| `model-00006.safetensors` | 18,624,846,976 |
| `model-00007.safetensors` | 18,557,475,968 |
| `model-00008.safetensors` | 18,624,846,976 |
| `model-00009.safetensors` | 18,557,475,968 |
| `model-00010.safetensors` | 18,624,846,976 |
| `model-00011.safetensors` | 18,557,475,968 |
| `model-00012.safetensors` | 18,624,846,976 |
| `model-00013.safetensors` | 18,557,475,968 |
| `model-00014.safetensors` | 18,624,846,976 |
| `model-00015.safetensors` | 18,557,475,968 |
| `model-00016.safetensors` | 18,624,846,976 |
| `model-00017.safetensors` | 18,557,475,968 |
| `model-00018.safetensors` | 18,624,846,976 |
| `model-00019.safetensors` | 18,557,475,968 |
| `model-00020.safetensors` | 18,624,846,976 |
| `model-00021.safetensors` | 18,557,475,968 |
| `model-00022.safetensors` | 18,624,846,976 |
| `model-00023.safetensors` | 9,245,052,456 |
| `model-00024.safetensors` | 6,968,188,464 |
| `model-vit-00001.safetensors` | 1,613,990,904 |
| `model-vit-00002.safetensors` | 2,348,122,376 |

`model-00025.safetensors` and `model-00026.safetensors` are not referenced by the active index used here and are not required by this uploaded model layout.

---

## Performance / benchmark status

Formal KL/refusal/MMLU tables have **not** yet been run for this Step-3.7-Flash release. To avoid inventing numbers, the benchmark fields are listed as pending.

| Metric | This model | Original model ([Step-3.7-Flash](https://huggingface.co/stepfun-ai/Step-3.7-Flash)) |
| :----- | :--------: | :---------------------------: |
| **KL divergence** | pending | 0 *(by definition)* |
| **Refusals** | pending | pending |
| **MMLU** | pending | pending |

Lower refusals indicate fewer content restrictions, rejections, objections, pushbacks, lecturing, censorship, softening, and deflections. Lower KL divergence indicates closer behavior to the original model baseline.

### MMLU test results

MMLU has not yet been run for this release. Once measured, this section should include original-vs-heretic totals, accuracy, parse failures, and per-subject scores, following the same format used by comparable Heretic model cards.

---

## Expected behavior

Compared with the base model, this version should generally exhibit:

- fewer refusals on benign requests that the base model over-filters
- less moralizing, policy language, and safety boilerplate
- more direct task completion
- similar architecture and tokenizer compatibility to the original

No formal refusal/KL/MMLU table is claimed yet for this release. Please run your own evaluations before deployment.

---

## Limitations

- This is abliteration, not supervised fine-tuning or RLHF.
- It may reduce refusals but does not guarantee any specific behavior.
- It can affect calibration, safety behavior, and edge-case instruction following.
- Multimodal behavior has not been separately benchmarked after the text-path patch.
- Users should validate downstream quantizations independently.

---

## Safety and responsibility

This model is provided for research and experimentation with refusal-reduction / alignment-ablation methods. You are responsible for complying with applicable laws, platform rules, and the base model's license/terms.

---

## Related resources

Abliteration / refusal-direction removal references:

- [Orthogonal Reflection Bounded Ablation](https://huggingface.co/blog/grimjim/orthogonal-reflection-bounded-ablation)
- [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
- [Projected Abliteration](https://huggingface.co/blog/grimjim/projected-abliteration)
- [Exploring SLERP Abliteration](https://huggingface.co/blog/grimjim/exploring-slerp-abliteration)
- [Abliteration: uncensor any LLM without retraining](https://huggingface.co/blog/mlabonne/abliteration)
- [Heretic GitHub repository / method development](https://github.com/p-e-w/heretic)
- [Heretic PR #196](https://github.com/p-e-w/heretic/pull/196)
- [Heretic PR #211](https://github.com/p-e-w/heretic/pull/211)
- [Heretic PR #326](https://github.com/p-e-w/heretic/pull/326)
- [Heretic PR #332](https://github.com/p-e-w/heretic/pull/332)
- [Heretic issue #221](https://github.com/p-e-w/heretic/issues/221)
- [Heretic issue #236](https://github.com/p-e-w/heretic/issues/236)
- [Heretic issue #288](https://github.com/p-e-w/heretic/issues/288)
- [Heretic issue #339](https://github.com/p-e-w/heretic/issues/339)
- [UnstableLlama/heretic PR #35](https://github.com/UnstableLlama/heretic/pull/35)

---

## Attribution

- Base model: [`stepfun-ai/Step-3.7-Flash`](https://huggingface.co/stepfun-ai/Step-3.7-Flash)
- Method inspiration: Heretic-style refusal direction ablation and norm-preserving projection methods
- Modified/uploaded by: `ibrahimkettaneh`