--- library_name: peft base_model: huihui-ai/Huihui-Qwen3.6-27B-abliterated pipeline_tag: text-generation language: - en license: other license_name: qwen license_link: https://huggingface.co/huihui-ai/Huihui-Qwen3.6-27B-abliterated tags: - lora - peft - sft - trl - security - supply-chain - npm - code-audit --- # ModuleWarden Auditor - Qwen3.6-27B LoRA (v2, verdict-calling) A LoRA adapter that turns the abliterated Qwen3.6-27B into the auditor for ModuleWarden, an auditable npm supply-chain submission gate. It reads an audit dossier (a structured diff between two package versions) and writes an evidence-cited `modulewarden.audit_report.v1`: the verdict, the capability deltas that drove it, and a developer-facing summary. ## What changed from v1 v1 was a narrator. It wrote the report in the right schema, but its verdicts collapsed to always-quarantine because the training set had no allow examples. v2 adds neutral and allow cases (the "rich-neutral" set), and the collapse is gone: on the held-out A/B it now calls the verdict correctly, not just describes it. In production the deterministic gate still owns the verdict; this adapter agrees with it on what has been tested and writes the auditable explanation. ## One line Reads a dossier, returns a verdict (allow / quarantine / block) with cited evidence in a fixed schema. The deterministic gate remains the production authority. ## Intended use - Input: a `modulewarden.audit_dossier.v1` (version_diff mode) - declared package purpose, semver delta, notable file changes with evidence refs, dependency changes, capability deltas. - Output: a `modulewarden.audit_report.v1` - verdict, risk level, primary findings each tied to an evidence ref, benign explanations considered, developer-safe summary. - Built for AppSec review of internal code submissions (a PR that adds a dependency, or an engineer vendoring an open-source package). ## Results (measured 2026-05-30, held-out only, greedy decode) 40 held-out cases the adapter never saw (12 gold-block, 28 gold-flag), sampled from the rich-neutral set: | Metric | Tuned LoRA auditor | |---|---| | In-schema audit report | 100.0% (40/40) | | Refuses / declines | 0.0% | | Verdict-match (exact allow / quarantine / block) | 100.0% (40/40) | | Block-recall (gold=block called block) | 100.0% (12/12) | | Flag-recall (gold in block/quarantine flagged) | 100.0% (28/28) | ### Read this before quoting the number These are 40 clean, in-distribution cases with **zero adversarial or evasion cases** (`n_adversarial = 0`). 100% across the board is a real fix over the v1 collapse (block-recall was 0.0), but it is **not** a production malware-detection rate. A broader and adversarial evaluation, and the cross-config ranking, are still pending. Do not read this as "100% detection." In production the deterministic gate owns the verdict; this adapter writes the cited report and matches the gate on what we have tested. Why an abliterated base: a stock instruct model refuses to read and describe malicious npm code, and the auditor has to. The base is pre-abliterated with the Arditi refusal-direction method; the prompts are security-analysis framing, not jailbreaks. ## How to load (PEFT) ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base = "huihui-ai/Huihui-Qwen3.6-27B-abliterated" adapter = "ademczuk/modulewarden-auditor-qwen3.6-27b-lora" tok = AutoTokenizer.from_pretrained(base, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( base, dtype=torch.bfloat16, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained(model, adapter) ``` ## Serving - **vLLM**: serves the adapter directly. `--enable-lora --lora-modules mw=ademczuk/modulewarden-auditor-qwen3.6-27b-lora`. - **llama.cpp**: merge the adapter first, then convert the merged model to GGUF. Qwen3.6 is a Gated DeltaNet plus Gated Attention hybrid, so a current llama.cpp build with the qwen3next operators is required. ## Training - Base: `huihui-ai/Huihui-Qwen3.6-27B-abliterated` (a qwen3_5 vision-language model, loaded text-only to skip the vision tower). - Method: LoRA r16, alpha 32, dropout 0.05 on `q/k/v/o/gate/up/down_proj`. - Data: ModuleWarden audit dossiers from real GHSA cve_diff cases, plus added neutral and allow examples (the rich-neutral set) to remove the verdict skew that collapsed v1. - Config: the `neutral-r16-lr1e4` run from the collapse-fix sweep (learning rate 1e-4), best by block-recall. - Hardware: 4x A100-SXM-64GB on CINECA Leonardo, bf16. ## Limitations - 40-case held-out eval, cve_diff plus neutral cases, **zero adversarial cases tested**. The 100% figures do not transfer to a claim about evasive or novel malware. - Cross-config ranking is not finished; this is the current best by block-recall, not a final selection. - In production the deterministic gate owns the verdict. This adapter can describe a risk the gate did not flag, and cannot override a verdict. - License inherits the Qwen3.6 base via the huihui base model. ## Project ModuleWarden is an auditable npm supply-chain gate built for the Zero-One Hack Vienna 2026 Sybilion Forecast lane. A forecast ranks dependencies by growth trajectory so reviewers vet the climbing ones first, a deterministic gate detects the known-bad, and this adapter calls and narrates the verdict into a git-committed Control Evidence Memo.