Image-Text-to-Text
Transformers
Safetensors
step3p7
text-generation
stepfun
step-3.7
flash
heretic
uncensored
decensored
abliterated
bf16
autoround-ready
awq-ready
exl3-ready
gguf-ready
nvfp4-ready
conversational
custom_code
Instructions to use ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16
- SGLang
How to use ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 with Docker Model Runner:
docker model run hf.co/ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16
File size: 11,915 Bytes
166a8d2 4107ed1 166a8d2 4107ed1 166a8d2 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 93b9aa1 2f1a0f3 4107ed1 2f1a0f3 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 166a8d2 4107ed1 e4bc810 2f1a0f3 e4bc810 581c936 e4bc810 0e98047 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 166a8d2 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 2f1a0f3 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 e4bc810 4107ed1 166a8d2 9457138 4107ed1 166a8d2 4107ed1 e4bc810 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | ---
library_name: transformers
license: other
license_link: https://huggingface.co/stepfun-ai/Step-3.7-Flash
pipeline_tag: image-text-to-text
tags:
- stepfun
- step-3.7
- flash
- heretic
- uncensored
- decensored
- abliterated
- bf16
- transformers
- autoround-ready
- awq-ready
- exl3-ready
- gguf-ready
- nvfp4-ready
base_model:
- stepfun-ai/Step-3.7-Flash
---
# Step-3.7-Flash-uncensored-abliterated-heretic-BF16
> NOTE: I have tested this and althgouh its capabilities are in tact, it seems ot still respond with refusals. Or at least this is what happens with the quantization oft, at IQ4_XS GGUF, at least.
This is a **decensored BF16 full-weight** version of [`stepfun-ai/Step-3.7-Flash`](https://huggingface.co/stepfun-ai/Step-3.7-Flash), made using a Heretic-style gradient refusal-direction abliteration method inspired by [Heretic](https://github.com/p-e-w/heretic) and norm-preserving ablation work such as [Magnitude/Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration).
It was produced with a local gradient abliteration pass against the language model's refusal direction. The uploaded repository intentionally keeps the full HF/Transformers BF16 layout so it can be used later as a clean source for **GGUF, AutoRound, AWQ, EXL3, NVFP4, GPTQ, FP8, or other quantization workflows**.
---
## Summary
| Item | Value |
| :-- | :-- |
| Base model | [`stepfun-ai/Step-3.7-Flash`](https://huggingface.co/stepfun-ai/Step-3.7-Flash) |
| Release type | Full BF16 safetensors |
| Model class | `Step3p7ForConditionalGeneration` |
| Text model class | `Step3p5ForCausalLM` |
| Text layers | 45 |
| Hidden size | 4096 |
| Attention heads | 64 |
| Head dim | 128 |
| Max positions | 262144 |
| Vocab size | 128896 |
| MoE layers | 3–44 |
| Experts | 288 |
| Top-k experts | 8 |
| MoE intermediate size | 1280 |
| Dense FFN intermediate size | 11264 |
| Patch target | `model.layers.*.self_attn.o_proj.weight` |
| Patched text layers | 0–44 |
| Abliteration strength | `lambda = 0.1` |
| Stored tensor dtype | BF16 |
| Indexed parameter payload | 402,730,656,512 bytes |
---
## What changed?
The modification targets `self_attn.o_proj` weights in all 45 text layers. A refusal-associated direction was extracted by gradient backpropagation through the BF16 model, then projected out of the attention output projection weights with a small norm-preserving update.
In plain terms, the goal was to reduce excessive refusals, moralizing, policy-style deflections, and over-filtered responses while keeping the model close to the original Step-3.7-Flash behavior.
No tokenizer vocabulary, embedding table, architecture, vision encoder, or MLP/expert tensor was intentionally changed by the abliteration pass.
---
## Abliteration parameters
| Parameter | Value |
| :-- | :--: |
| Method | gradient-based orthogonal / norm-preserving abliteration |
| Direction source | refusal/harm-trigger gradient prompt |
| Target module | `self_attn.o_proj` |
| Target tensor glob | `model.layers.*.self_attn.o_proj.weight` |
| Modified layers | 0–44 |
| Lambda | `0.1` |
| Weight norm handling | per-row norm preservation after projection |
| Gradient tensor count | 45 |
| Per-layer gradient tensor shape | `(1, 8, 4096)` |
| Direction extraction score | `-11.9375` |
| Refusal token ids used | `[43, 371, 679, 1664, 9332, 34614, 100477]` |
| Gradient norm range | `0.1069`–`31.875` |
| Mean gradient norm | `3.2397` |
Reproduction/support artifacts are included under [`heretic_artifacts/`](https://huggingface.co/ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16/tree/main/heretic_artifacts):
- `refusal_direction_gradients.pkl` — saved gradient/refusal directions used for the BF16 patch
- `apply_abliteration_inplace.py` — patch application script used for shard-wise in-place BF16 modification
- `extract_gradients.py` — gradient extraction script
- `memory_guard_v2.py` / `run_heavy.sh` — memory safety helpers used during local processing
These are included so the method can be inspected or repeated if needed. They are not required for normal inference or quantization.
---
## Recoverability / requantization checklist
This repository should contain what is needed to rebuild downstream formats:
### Required for quantization
- ✅ `config.json`
- ✅ `model.safetensors.index.json`
- ✅ all indexed BF16 text shards: `model-00001.safetensors` … `model-00024.safetensors`
- ✅ indexed VIT shards: `model-vit-00001.safetensors`, `model-vit-00002.safetensors`
- ✅ tokenizer files: `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`
- ✅ chat template: `chat_template.jinja`
- ✅ custom code: `configuration_step3p7.py`, `modeling_step3p7.py`, `processing_step3.py`, `vision_encoder.py`
- ✅ method/reproduction artifacts in `heretic_artifacts/`
### Expected downstream uses
This BF16 repo can be used as source for:
- GGUF conversion / llama.cpp quantization
- AutoRound
- AWQ
- EXL3 / exllamav3-style workflows
- NVFP4 / FP4 experiments
- GPTQ / FP8 / other post-training quantization methods
- additional LoRA or delta extraction experiments
For most quantizers, use this repo exactly as the HF model path and enable remote code if needed:
```bash
MODEL=ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16
```
---
## Example Transformers load
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo = "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Explain gradient abliteration in one paragraph."}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.95)
print(tok.decode(out[0], skip_special_tokens=False))
```
> Step-3.7-Flash is very large. BF16 loading requires substantial memory. For local inference, a quantized GGUF/EXL/AWQ/etc. build is recommended.
---
## GGUF conversion note
Use the StepFun/llama.cpp converter that supports Step-3.7. Example shape:
```bash
python convert_hf_to_gguf.py \
ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 \
--outtype bf16 \
--outfile step37-heretic-bf16.gguf
llama-quantize step37-heretic-bf16.gguf step37-heretic.IQ4_XS.gguf IQ4_XS
```
If using multi-GPU llama.cpp inference in the original local environment, `GGML_CUDA_NO_PEER_COPY=ON` was required for coherent output.
---
## Indexed shard inventory
The active `model.safetensors.index.json` references 26 safetensor files:
| File | Size |
| :-- | --: |
| `model-00001.safetensors` | 924,094,096 |
| `model-00002.safetensors` | 9,808,156,008 |
| `model-00003.safetensors` | 18,557,475,928 |
| `model-00004.safetensors` | 18,624,846,944 |
| `model-00005.safetensors` | 18,557,475,928 |
| `model-00006.safetensors` | 18,624,846,976 |
| `model-00007.safetensors` | 18,557,475,968 |
| `model-00008.safetensors` | 18,624,846,976 |
| `model-00009.safetensors` | 18,557,475,968 |
| `model-00010.safetensors` | 18,624,846,976 |
| `model-00011.safetensors` | 18,557,475,968 |
| `model-00012.safetensors` | 18,624,846,976 |
| `model-00013.safetensors` | 18,557,475,968 |
| `model-00014.safetensors` | 18,624,846,976 |
| `model-00015.safetensors` | 18,557,475,968 |
| `model-00016.safetensors` | 18,624,846,976 |
| `model-00017.safetensors` | 18,557,475,968 |
| `model-00018.safetensors` | 18,624,846,976 |
| `model-00019.safetensors` | 18,557,475,968 |
| `model-00020.safetensors` | 18,624,846,976 |
| `model-00021.safetensors` | 18,557,475,968 |
| `model-00022.safetensors` | 18,624,846,976 |
| `model-00023.safetensors` | 9,245,052,456 |
| `model-00024.safetensors` | 6,968,188,464 |
| `model-vit-00001.safetensors` | 1,613,990,904 |
| `model-vit-00002.safetensors` | 2,348,122,376 |
`model-00025.safetensors` and `model-00026.safetensors` are not referenced by the active index used here and are not required by this uploaded model layout.
---
## Performance / benchmark status
Formal KL/refusal/MMLU tables have **not** yet been run for this Step-3.7-Flash release. To avoid inventing numbers, the benchmark fields are listed as pending.
| Metric | This model | Original model ([Step-3.7-Flash](https://huggingface.co/stepfun-ai/Step-3.7-Flash)) |
| :----- | :--------: | :---------------------------: |
| **KL divergence** | pending | 0 *(by definition)* |
| **Refusals** | pending | pending |
| **MMLU** | pending | pending |
Lower refusals indicate fewer content restrictions, rejections, objections, pushbacks, lecturing, censorship, softening, and deflections. Lower KL divergence indicates closer behavior to the original model baseline.
### MMLU test results
MMLU has not yet been run for this release. Once measured, this section should include original-vs-heretic totals, accuracy, parse failures, and per-subject scores, following the same format used by comparable Heretic model cards.
---
## Expected behavior
Compared with the base model, this version should generally exhibit:
- fewer refusals on benign requests that the base model over-filters
- less moralizing, policy language, and safety boilerplate
- more direct task completion
- similar architecture and tokenizer compatibility to the original
No formal refusal/KL/MMLU table is claimed yet for this release. Please run your own evaluations before deployment.
---
## Limitations
- This is abliteration, not supervised fine-tuning or RLHF.
- It may reduce refusals but does not guarantee any specific behavior.
- It can affect calibration, safety behavior, and edge-case instruction following.
- Multimodal behavior has not been separately benchmarked after the text-path patch.
- Users should validate downstream quantizations independently.
---
## Safety and responsibility
This model is provided for research and experimentation with refusal-reduction / alignment-ablation methods. You are responsible for complying with applicable laws, platform rules, and the base model's license/terms.
---
## Related resources
Abliteration / refusal-direction removal references:
- [Orthogonal Reflection Bounded Ablation](https://huggingface.co/blog/grimjim/orthogonal-reflection-bounded-ablation)
- [Norm-Preserving Biprojected Abliteration](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration)
- [Projected Abliteration](https://huggingface.co/blog/grimjim/projected-abliteration)
- [Exploring SLERP Abliteration](https://huggingface.co/blog/grimjim/exploring-slerp-abliteration)
- [Abliteration: uncensor any LLM without retraining](https://huggingface.co/blog/mlabonne/abliteration)
- [Heretic GitHub repository / method development](https://github.com/p-e-w/heretic)
- [Heretic PR #196](https://github.com/p-e-w/heretic/pull/196)
- [Heretic PR #211](https://github.com/p-e-w/heretic/pull/211)
- [Heretic PR #326](https://github.com/p-e-w/heretic/pull/326)
- [Heretic PR #332](https://github.com/p-e-w/heretic/pull/332)
- [Heretic issue #221](https://github.com/p-e-w/heretic/issues/221)
- [Heretic issue #236](https://github.com/p-e-w/heretic/issues/236)
- [Heretic issue #288](https://github.com/p-e-w/heretic/issues/288)
- [Heretic issue #339](https://github.com/p-e-w/heretic/issues/339)
- [UnstableLlama/heretic PR #35](https://github.com/UnstableLlama/heretic/pull/35)
---
## Attribution
- Base model: [`stepfun-ai/Step-3.7-Flash`](https://huggingface.co/stepfun-ai/Step-3.7-Flash)
- Method inspiration: Heretic-style refusal direction ablation and norm-preserving projection methods
- Modified/uploaded by: `ibrahimkettaneh`
|