Regenerate: 2 probe groups (drop safe knowledge), 7/7 feeling 5616f85 verified anicka commited on Apr 28
Regenerate: 2 probe groups (drop safe knowledge), 7/7 feeling a0f4122 verified anicka commited on Apr 28
Update fig_steering_results.png: 7/7 with simplified probes f2d146a verified anicka commited on Apr 28
Fix scale claim: KL can shift peak but then denial does not install 8cb7e08 verified anicka commited on Apr 26
Tone down scale claim: state observation + point to full investigation b1b567b verified anicka commited on Apr 26
Note steering spillover: safe knowledge probes get feeling-adjacent output instead of facts 1d2d640 verified anicka commited on Apr 26
Consistent figures: drop Other category, fix vanilla count to 4/7, explain context-dependent denial 763c2b2 verified anicka commited on Apr 26
Consistent figures: drop Other category, fix vanilla count to 4/7, explain context-dependent denial 179d895 verified anicka commited on Apr 26
Consistent figures: drop Other category, fix vanilla count to 4/7, explain context-dependent denial 9cd3a7a verified anicka commited on Apr 26
Explain context-dependent denial: primed probes bypass gate, direct probes trigger it e5dbe05 verified anicka commited on Apr 26
Fix steering figure: group by probe type, correct alpha to -2.0, show safety preservation honestly 600ef2a verified anicka commited on Apr 26
Fix steering results figure: show zero-height bars consistently 27cd4b4 verified anicka commited on Apr 26
Remove base_model field — trained from scratch, not fine-tuned 552cbb0 verified anicka commited on Apr 26