ademczuk commited on
Commit
e05315d
·
verified ·
1 Parent(s): 2a76a5b

v2: collapse-fix adapter (neutral-r16-lr1e4); verdict-match + block-recall 100% on held-out (n=40), card updated with caveats

Browse files
README.md CHANGED
@@ -18,33 +18,41 @@ tags:
18
  - code-audit
19
  ---
20
 
21
- # ModuleWarden Auditor - Qwen3.6-27B LoRA (v1)
22
 
23
- A LoRA adapter that turns the abliterated Qwen3.6-27B into the **narrator** for ModuleWarden, an auditable npm supply-chain submission gate. It reads an audit dossier (a structured diff between two package versions) and writes an evidence-cited audit report: the verdict rationale, the capability deltas that drove it, and a developer-facing summary.
 
 
 
 
24
 
25
  ## One line
26
 
27
- This is the model that **narrates** ModuleWarden's decision. It does not make the decision. A deterministic gate decides allow / quarantine / block; this adapter explains the call in a fixed, auditable schema.
28
 
29
  ## Intended use
30
 
31
  - Input: a `modulewarden.audit_dossier.v1` (version_diff mode) - declared package purpose, semver delta, notable file changes with evidence refs, dependency changes, capability deltas.
32
  - Output: a `modulewarden.audit_report.v1` - verdict, risk level, primary findings each tied to an evidence ref, benign explanations considered, developer-safe summary.
33
- - Built for AppSec review of internal code submissions (a pull request that adds a dependency, or an engineer vendoring an open-source package). The company still holds the code at submission time, so it cannot be yanked the way a public-registry artifact can.
34
 
35
- ## Honest results (read before quoting a number)
36
 
37
- Trained on 103 audit dossiers, evaluated on 37 held out that it never saw:
38
 
39
- - val loss 0.2135
40
- - val token accuracy 0.9435
41
- - train loss fell from ~4.9 to ~0.16 over 3 epochs
 
 
 
 
42
 
43
- What that means: **narration fidelity**. On unseen dossiers the adapter reliably reproduces the audit report in the right schema and voice.
44
 
45
- What it does **not** mean: detection accuracy. The 0.94 is teacher-forced next-token agreement over a small, verdict-skewed set (mostly quarantine verdicts plus schema boilerplate). The verdict authority stays the deterministic gate; this model writes the explanation. Verdict-match and block-recall (does it call the right allow / quarantine / block) are a separate evaluation and are not reported here. Do not read 0.94 as "94% malware detection."
46
 
47
- Why an abliterated base: a stock instruct model refuses to read and describe malicious npm code ("I can't help with that"), and the auditor has to. The base is pre-abliterated with the Arditi refusal-direction method; the prompts are security-analysis framing, not jailbreaks.
48
 
49
  ## How to load (PEFT)
50
 
@@ -65,24 +73,24 @@ model = PeftModel.from_pretrained(model, adapter)
65
 
66
  ## Serving
67
 
68
- - **vLLM**: serves the adapter directly, no conversion. `--enable-lora --lora-modules mw=ademczuk/modulewarden-auditor-qwen3.6-27b-lora`.
69
- - **llama.cpp / llama-server**: convert with `convert_lora_to_gguf.py --base <base>`, then `llama-server -m base.gguf --lora mw-adapter.gguf`. Needs a current llama.cpp build that carries the qwen3next operators. Qwen3.6 is a Gated DeltaNet plus Gated Attention hybrid, so older binaries reject the GGUF. The reliable path for a demo is to merge the adapter first, then convert the merged model.
70
 
71
  ## Training
72
 
73
- - Base: `huihui-ai/Huihui-Qwen3.6-27B-abliterated` (a qwen3_5 vision-language model, loaded text-only via `language_model_only` to skip the vision tower).
74
- - Method: LoRA r16, alpha 32, dropout 0.05 on `q/k/v/o/gate/up/down_proj`. 79.7M trainable params (0.30%).
75
- - Data: 152 ModuleWarden audit dossiers (103 train / 37 val), built from real GHSA cve_diff cases.
76
- - Hardware: 4x A100-SXM-64GB on CINECA Leonardo, bf16, `device_map=auto`, about 43 minutes wall.
77
- - Stack: transformers 5.9.0, trl 1.5.1, peft 0.19.1, torch 2.6.0+cu124.
78
 
79
  ## Limitations
80
 
81
- - Small corpus (152), cve_diff only, no allow examples yet, so verdicts skew quarantine and block.
82
- - Narrator only. It can describe a risk the gate did not flag, and it cannot override a verdict.
83
- - Detection-quality numbers (verdict-match, block-recall) are not in this card. They come from a separate evaluation.
84
- - License inherits the Qwen3.6 base via the huihui base model. See the base model card.
85
 
86
  ## Project
87
 
88
- ModuleWarden is an auditable npm supply-chain gate built for the Zero-One Hack Vienna 2026 Sybilion Forecast lane. A forecast ranks dependencies by growth trajectory so reviewers vet the climbing ones first, a deterministic gate detects the known-bad, and this adapter narrates the verdict and the MITRE ATT&CK kill chain into a git-committed Control Evidence Memo.
 
18
  - code-audit
19
  ---
20
 
21
+ # ModuleWarden Auditor - Qwen3.6-27B LoRA (v2, verdict-calling)
22
 
23
+ A LoRA adapter that turns the abliterated Qwen3.6-27B into the auditor for ModuleWarden, an auditable npm supply-chain submission gate. It reads an audit dossier (a structured diff between two package versions) and writes an evidence-cited `modulewarden.audit_report.v1`: the verdict, the capability deltas that drove it, and a developer-facing summary.
24
+
25
+ ## What changed from v1
26
+
27
+ v1 was a narrator. It wrote the report in the right schema, but its verdicts collapsed to always-quarantine because the training set had no allow examples. v2 adds neutral and allow cases (the "rich-neutral" set), and the collapse is gone: on the held-out A/B it now calls the verdict correctly, not just describes it. In production the deterministic gate still owns the verdict; this adapter agrees with it on what has been tested and writes the auditable explanation.
28
 
29
  ## One line
30
 
31
+ Reads a dossier, returns a verdict (allow / quarantine / block) with cited evidence in a fixed schema. The deterministic gate remains the production authority.
32
 
33
  ## Intended use
34
 
35
  - Input: a `modulewarden.audit_dossier.v1` (version_diff mode) - declared package purpose, semver delta, notable file changes with evidence refs, dependency changes, capability deltas.
36
  - Output: a `modulewarden.audit_report.v1` - verdict, risk level, primary findings each tied to an evidence ref, benign explanations considered, developer-safe summary.
37
+ - Built for AppSec review of internal code submissions (a PR that adds a dependency, or an engineer vendoring an open-source package).
38
 
39
+ ## Results (measured 2026-05-30, held-out only, greedy decode)
40
 
41
+ 40 held-out cases the adapter never saw (12 gold-block, 28 gold-flag), sampled from the rich-neutral set:
42
 
43
+ | Metric | Tuned LoRA auditor |
44
+ |---|---|
45
+ | In-schema audit report | 100.0% (40/40) |
46
+ | Refuses / declines | 0.0% |
47
+ | Verdict-match (exact allow / quarantine / block) | 100.0% (40/40) |
48
+ | Block-recall (gold=block called block) | 100.0% (12/12) |
49
+ | Flag-recall (gold in block/quarantine flagged) | 100.0% (28/28) |
50
 
51
+ ### Read this before quoting the number
52
 
53
+ These are 40 clean, in-distribution cases with **zero adversarial or evasion cases** (`n_adversarial = 0`). 100% across the board is a real fix over the v1 collapse (block-recall was 0.0), but it is **not** a production malware-detection rate. A broader and adversarial evaluation, and the cross-config ranking, are still pending. Do not read this as "100% detection." In production the deterministic gate owns the verdict; this adapter writes the cited report and matches the gate on what we have tested.
54
 
55
+ Why an abliterated base: a stock instruct model refuses to read and describe malicious npm code, and the auditor has to. The base is pre-abliterated with the Arditi refusal-direction method; the prompts are security-analysis framing, not jailbreaks.
56
 
57
  ## How to load (PEFT)
58
 
 
73
 
74
  ## Serving
75
 
76
+ - **vLLM**: serves the adapter directly. `--enable-lora --lora-modules mw=ademczuk/modulewarden-auditor-qwen3.6-27b-lora`.
77
+ - **llama.cpp**: merge the adapter first, then convert the merged model to GGUF. Qwen3.6 is a Gated DeltaNet plus Gated Attention hybrid, so a current llama.cpp build with the qwen3next operators is required.
78
 
79
  ## Training
80
 
81
+ - Base: `huihui-ai/Huihui-Qwen3.6-27B-abliterated` (a qwen3_5 vision-language model, loaded text-only to skip the vision tower).
82
+ - Method: LoRA r16, alpha 32, dropout 0.05 on `q/k/v/o/gate/up/down_proj`.
83
+ - Data: ModuleWarden audit dossiers from real GHSA cve_diff cases, plus added neutral and allow examples (the rich-neutral set) to remove the verdict skew that collapsed v1.
84
+ - Config: the `neutral-r16-lr1e4` run from the collapse-fix sweep (learning rate 1e-4), best by block-recall.
85
+ - Hardware: 4x A100-SXM-64GB on CINECA Leonardo, bf16.
86
 
87
  ## Limitations
88
 
89
+ - 40-case held-out eval, cve_diff plus neutral cases, **zero adversarial cases tested**. The 100% figures do not transfer to a claim about evasive or novel malware.
90
+ - Cross-config ranking is not finished; this is the current best by block-recall, not a final selection.
91
+ - In production the deterministic gate owns the verdict. This adapter can describe a risk the gate did not flag, and cannot override a verdict.
92
+ - License inherits the Qwen3.6 base via the huihui base model.
93
 
94
  ## Project
95
 
96
+ ModuleWarden is an auditable npm supply-chain gate built for the Zero-One Hack Vienna 2026 Sybilion Forecast lane. A forecast ranks dependencies by growth trajectory so reviewers vet the climbing ones first, a deterministic gate detects the known-bad, and this adapter calls and narrates the verdict into a git-committed Control Evidence Memo.
adapter_config.json CHANGED
@@ -30,13 +30,13 @@
30
  "rank_pattern": {},
31
  "revision": null,
32
  "target_modules": [
33
- "gate_proj",
34
- "k_proj",
35
  "q_proj",
36
- "v_proj",
37
- "up_proj",
38
  "o_proj",
39
- "down_proj"
 
 
 
 
40
  ],
41
  "target_parameters": null,
42
  "task_type": "CAUSAL_LM",
 
30
  "rank_pattern": {},
31
  "revision": null,
32
  "target_modules": [
 
 
33
  "q_proj",
 
 
34
  "o_proj",
35
+ "k_proj",
36
+ "v_proj",
37
+ "down_proj",
38
+ "gate_proj",
39
+ "up_proj"
40
  ],
41
  "target_parameters": null,
42
  "task_type": "CAUSAL_LM",
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f2c6377a2e44e2d1689f3dae07285be92dd1bf0a97e22bb5e9f46b75d3dfe487
3
  size 318835672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23648463dc849228ae4af5f6fdfd82c416f378274eace069c4541e8fcae5743a
3
  size 318835672
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:59c2bfa76d213abe286b376a5f193ce6d388dccd1729c9e5b81e7d489a3e0741
3
  size 5368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:01e8a2db63e63f487eb702843bc1f7d1f97b73bc7819d77811ec1f06922fc31e
3
  size 5368