Question from a newb

#1
by yasor84052 - opened

I am confused here, I was under impression that the workflow should be:

  1. Abliterate the base model — remove refusal directions from the residual stream before any personality is baked in
  2. Then finetune — so the finetuning reinforces the abliterated behavior rather than fighting against it

Doing it the other way around (finetune first, then abliterate) means you're trying to surgically remove refusals that are now entangled with the roleplay/personality training. You get higher chance of coherence degradation.

Owner

I am not the author of this fintune, ConicCat is, Hereticating finetunes is fine, I did plenty and you can see the benchmarks on the model cards, feel free to test the models yourself too also if you prefer that over benchmarks.

I guess what I was saying is that it might produce better quality if you guys collaborate on this so that you for example abliterate base gemma4 model and then they finetune off that. I guess I am just curious how much of a model quality difference that would make.

Sign up or log in to comment