Qwen3.5-4B Capability Vector v2 — same-model contrast (2026-05-14)

STATUS — null result on aggregate, with new methodological finding. v2 replicates v1's null behavioural lift but with a cleaner mathematical signal (margin/ambient ratio ~2× v1). The same-model contrast removes the model-identity confound that motivated v1's "AUC=1.0 must mean capability" claim. Behaviourally: 1/5 pass-rate on the sprint at α=4 (same as base), 0/5 at α=2. Across 9 sweeps and ~140 docker runs in this run dir, no aggregate lift. Confirms that the direction encodes output style (parse_fail ↔ no_cmd trade-off), not task-solving capability.

What changed from v1

dimension v1 v2
positives 5 SFT-pass traces 6 SFT/RIFT-pass traces (reuses v1)
negatives 12 traces from different LoRAs (base, cp600, dpo) 20 traces from same SFT LoRA with reward=0 + parse_fail=0 + steps≥15
confound model identity baked into direction same-model: only outcome varies
AUC at L22 1.000 1.000
margin / ambient norm at L22 0.17 0.29 (~1.7× cleaner)

How to use

import torch
from transformers import AutoTokenizer, AutoModelForImageTextToText
from huggingface_hub import hf_hub_download

tok = AutoTokenizer.from_pretrained('Qwen/Qwen3.5-4B')
model = AutoModelForImageTextToText.from_pretrained(
    'Qwen/Qwen3.5-4B', dtype=torch.bfloat16, device_map={'':0})

vec_path = hf_hub_download('AlexWortega/qwen3.5-4b-capvec-v2-samemodel-20260514', 'vectors/dir.pt')
vec = torch.load(vec_path, weights_only=False)
# See vectors/ranking.csv for AUC-ordered layer list. L=22 chosen for cross-comparison with v1.

Behavioural results

Multi-task (3 configs × 6 tasks):

config pass / 6 parse_fail/run no_cmd/run
baseline 2 2.2 1.2
steered-L22-α4 2 1.3 ↓ 2.7 ↑
steered-L30-α4 1 1.8 0.3

Steering trades parse_fail for no_cmd. Format compliance improves, action emission degrades, net pass rate unchanged.

Across all sweeps: pass-rate Fisher's p > 0.5 vs base.

Key files

  • vectors/dir.pt — 32 directions, AUC=1.0 on L19–L31
  • vectors/ranking.csv — full AUC ranking
  • RESULTS.md, RESULTS_FINAL.md — honest write-up
  • scripts/ — collect, capture, compute, serve, sweep_eval (sgang-compatible)
  • results/*/master_summary.csv — per-task trial data

Caveats

  • n=26 contrast traces. AUC=1.0 is plausible but unrelated to behavioural lift.
  • Direction at L22 cos with v1's L22 ≈ 0.5 — they share a partial subspace.
  • α-grid sweep confirmed at higher α model breaks: α=8 → 1.4 avg steps before "done" bail, α=6 → in-budget but task-failing.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlexWortega/qwen3.5-4b-capvec-v2-samemodel-20260514

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(260)
this model