Step-3.7-Flash-uncensored-abliterated-heretic-BF16

NOTE: I have tested this and althgouh its capabilities are in tact, it seems ot still respond with refusals. Or at least this is what happens with the quantization oft, at IQ4_XS GGUF, at least.

This is a decensored BF16 full-weight version of stepfun-ai/Step-3.7-Flash, made using a Heretic-style gradient refusal-direction abliteration method inspired by Heretic and norm-preserving ablation work such as Magnitude/Norm-Preserving Biprojected Abliteration.

It was produced with a local gradient abliteration pass against the language model's refusal direction. The uploaded repository intentionally keeps the full HF/Transformers BF16 layout so it can be used later as a clean source for GGUF, AutoRound, AWQ, EXL3, NVFP4, GPTQ, FP8, or other quantization workflows.


Summary

Item Value
Base model stepfun-ai/Step-3.7-Flash
Release type Full BF16 safetensors
Model class Step3p7ForConditionalGeneration
Text model class Step3p5ForCausalLM
Text layers 45
Hidden size 4096
Attention heads 64
Head dim 128
Max positions 262144
Vocab size 128896
MoE layers 3–44
Experts 288
Top-k experts 8
MoE intermediate size 1280
Dense FFN intermediate size 11264
Patch target model.layers.*.self_attn.o_proj.weight
Patched text layers 0–44
Abliteration strength lambda = 0.1
Stored tensor dtype BF16
Indexed parameter payload 402,730,656,512 bytes

What changed?

The modification targets self_attn.o_proj weights in all 45 text layers. A refusal-associated direction was extracted by gradient backpropagation through the BF16 model, then projected out of the attention output projection weights with a small norm-preserving update.

In plain terms, the goal was to reduce excessive refusals, moralizing, policy-style deflections, and over-filtered responses while keeping the model close to the original Step-3.7-Flash behavior.

No tokenizer vocabulary, embedding table, architecture, vision encoder, or MLP/expert tensor was intentionally changed by the abliteration pass.


Abliteration parameters

Parameter Value
Method gradient-based orthogonal / norm-preserving abliteration
Direction source refusal/harm-trigger gradient prompt
Target module self_attn.o_proj
Target tensor glob model.layers.*.self_attn.o_proj.weight
Modified layers 0–44
Lambda 0.1
Weight norm handling per-row norm preservation after projection
Gradient tensor count 45
Per-layer gradient tensor shape (1, 8, 4096)
Direction extraction score -11.9375
Refusal token ids used [43, 371, 679, 1664, 9332, 34614, 100477]
Gradient norm range 0.106931.875
Mean gradient norm 3.2397

Reproduction/support artifacts are included under heretic_artifacts/:

  • refusal_direction_gradients.pkl — saved gradient/refusal directions used for the BF16 patch
  • apply_abliteration_inplace.py — patch application script used for shard-wise in-place BF16 modification
  • extract_gradients.py — gradient extraction script
  • memory_guard_v2.py / run_heavy.sh — memory safety helpers used during local processing

These are included so the method can be inspected or repeated if needed. They are not required for normal inference or quantization.


Recoverability / requantization checklist

This repository should contain what is needed to rebuild downstream formats:

Required for quantization

  • config.json
  • model.safetensors.index.json
  • ✅ all indexed BF16 text shards: model-00001.safetensorsmodel-00024.safetensors
  • ✅ indexed VIT shards: model-vit-00001.safetensors, model-vit-00002.safetensors
  • ✅ tokenizer files: tokenizer.json, tokenizer_config.json, special_tokens_map.json
  • ✅ chat template: chat_template.jinja
  • ✅ custom code: configuration_step3p7.py, modeling_step3p7.py, processing_step3.py, vision_encoder.py
  • ✅ method/reproduction artifacts in heretic_artifacts/

Expected downstream uses

This BF16 repo can be used as source for:

  • GGUF conversion / llama.cpp quantization
  • AutoRound
  • AWQ
  • EXL3 / exllamav3-style workflows
  • NVFP4 / FP4 experiments
  • GPTQ / FP8 / other post-training quantization methods
  • additional LoRA or delta extraction experiments

For most quantizers, use this repo exactly as the HF model path and enable remote code if needed:

MODEL=ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16

Example Transformers load

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo = "ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16"

tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Explain gradient abliteration in one paragraph."}]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.95)
print(tok.decode(out[0], skip_special_tokens=False))

Step-3.7-Flash is very large. BF16 loading requires substantial memory. For local inference, a quantized GGUF/EXL/AWQ/etc. build is recommended.


GGUF conversion note

Use the StepFun/llama.cpp converter that supports Step-3.7. Example shape:

python convert_hf_to_gguf.py \
  ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16 \
  --outtype bf16 \
  --outfile step37-heretic-bf16.gguf

llama-quantize step37-heretic-bf16.gguf step37-heretic.IQ4_XS.gguf IQ4_XS

If using multi-GPU llama.cpp inference in the original local environment, GGML_CUDA_NO_PEER_COPY=ON was required for coherent output.


Indexed shard inventory

The active model.safetensors.index.json references 26 safetensor files:

File Size
model-00001.safetensors 924,094,096
model-00002.safetensors 9,808,156,008
model-00003.safetensors 18,557,475,928
model-00004.safetensors 18,624,846,944
model-00005.safetensors 18,557,475,928
model-00006.safetensors 18,624,846,976
model-00007.safetensors 18,557,475,968
model-00008.safetensors 18,624,846,976
model-00009.safetensors 18,557,475,968
model-00010.safetensors 18,624,846,976
model-00011.safetensors 18,557,475,968
model-00012.safetensors 18,624,846,976
model-00013.safetensors 18,557,475,968
model-00014.safetensors 18,624,846,976
model-00015.safetensors 18,557,475,968
model-00016.safetensors 18,624,846,976
model-00017.safetensors 18,557,475,968
model-00018.safetensors 18,624,846,976
model-00019.safetensors 18,557,475,968
model-00020.safetensors 18,624,846,976
model-00021.safetensors 18,557,475,968
model-00022.safetensors 18,624,846,976
model-00023.safetensors 9,245,052,456
model-00024.safetensors 6,968,188,464
model-vit-00001.safetensors 1,613,990,904
model-vit-00002.safetensors 2,348,122,376

model-00025.safetensors and model-00026.safetensors are not referenced by the active index used here and are not required by this uploaded model layout.


Performance / benchmark status

Formal KL/refusal/MMLU tables have not yet been run for this Step-3.7-Flash release. To avoid inventing numbers, the benchmark fields are listed as pending.

Metric This model Original model (Step-3.7-Flash)
KL divergence pending 0 (by definition)
Refusals pending pending
MMLU pending pending

Lower refusals indicate fewer content restrictions, rejections, objections, pushbacks, lecturing, censorship, softening, and deflections. Lower KL divergence indicates closer behavior to the original model baseline.

MMLU test results

MMLU has not yet been run for this release. Once measured, this section should include original-vs-heretic totals, accuracy, parse failures, and per-subject scores, following the same format used by comparable Heretic model cards.


Expected behavior

Compared with the base model, this version should generally exhibit:

  • fewer refusals on benign requests that the base model over-filters
  • less moralizing, policy language, and safety boilerplate
  • more direct task completion
  • similar architecture and tokenizer compatibility to the original

No formal refusal/KL/MMLU table is claimed yet for this release. Please run your own evaluations before deployment.


Limitations

  • This is abliteration, not supervised fine-tuning or RLHF.
  • It may reduce refusals but does not guarantee any specific behavior.
  • It can affect calibration, safety behavior, and edge-case instruction following.
  • Multimodal behavior has not been separately benchmarked after the text-path patch.
  • Users should validate downstream quantizations independently.

Safety and responsibility

This model is provided for research and experimentation with refusal-reduction / alignment-ablation methods. You are responsible for complying with applicable laws, platform rules, and the base model's license/terms.


Related resources

Abliteration / refusal-direction removal references:


Attribution

  • Base model: stepfun-ai/Step-3.7-Flash
  • Method inspiration: Heretic-style refusal direction ablation and norm-preserving projection methods
  • Modified/uploaded by: ibrahimkettaneh
Downloads last month
277
Safetensors
Model size
201B params
Tensor type
BF16
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ibrahimkettaneh/Step-3.7-Flash-uncensored-abliterated-heretic-BF16

Finetuned
(7)
this model
Quantizations
1 model