vitorallo commited on
Commit
4927f6f
·
verified ·
1 Parent(s): d281097

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +33 -122
  2. model.safetensors +1 -1
README.md CHANGED
@@ -22,154 +22,65 @@ datasets:
22
 
23
  # securereview-7b-mlx-4bit
24
 
25
- A 4-bit quantised, MLX-native fine-tune of Qwen2.5-Coder-7B-Instruct for
26
- function-level security code review. Feed it a function, get back JSON
27
- findings with severity, category, CWE, line number, and a description.
28
- Runs on Apple Silicon with ~7 GB of memory.
29
 
30
- Trained on 13,484 examples across 9 languages (Python, JavaScript,
31
- TypeScript, Go, Java, Ruby, Rust, C++, C#) from CVEFixes, synthetic
32
  generation, real vulnerable applications, and community rule sets.
33
  All training data is permissively licensed.
34
 
35
  ## Benchmarks
36
 
37
- ### Out-of-distribution (real vulnerable apps)
 
 
38
 
39
- Tested against 33 unique vulnerable functions from 8 deliberately
40
- vulnerable applications (DVNA, NodeGoat, pygoat, crAPI, DSVW, WebGoat,
41
- RailsGoat, Juice Shop):
 
 
42
 
43
- **Recall: 33/33 (100%)**
44
 
45
  | Category | Recall |
46
  |----------|--------|
47
- | SQL Injection | 6/6 (100%) |
48
- | Command Injection | 3/3 (100%) |
49
- | SSRF | 2/2 (100%) |
50
- | Path Traversal | 2/2 (100%) |
51
- | Broken Access Control | 5/5 (100%) |
52
- | IDOR | 7/7 (100%) |
53
- | Code Injection | 1/1 (100%) |
54
- | Open Redirect | 1/1 (100%) |
55
- | Insecure Deserialization | 1/1 (100%) |
56
- | Broken Authentication | 1/1 (100%) |
57
-
58
- ### In-distribution (200-example test split)
59
-
60
- | Metric | Base Qwen | securereview-7b (M6) |
61
- |--------|----------:|---------------------:|
62
- | F1 | 14.3% | **44%+** |
63
- | FPR | 70.3% | **<3%** |
64
- | Logic bug recall | 29.2% | **37%+** |
65
- | Pattern bug recall | 28.9% | **59%+** |
66
-
67
- ### Auth-context awareness
68
-
69
- M6 understands the `Auth:` context line added by scanners that perform
70
- auth-coverage analysis. When the prompt includes
71
- `Auth: NONE -- no auth decorator or middleware protects this endpoint`,
72
- the model correctly flags IDOR and broken access control on small route
73
- handlers that previous versions missed.
74
-
75
- ## How to use
76
 
77
  ```python
78
  from mlx_lm import load, generate
79
 
80
  model, tok = load("vitorallo/securereview-7b-mlx-4bit")
81
-
82
- # Belt-and-braces stop-token patch for older mlx-lm versions
83
  if hasattr(tok, "eos_token_ids") and 151645 not in tok.eos_token_ids:
84
  tok.eos_token_ids.add(151645)
85
-
86
- SYSTEM = (
87
- "You are a JSON API that performs security code review. "
88
- "You only output valid JSON. Never output markdown, explanations, "
89
- "or text outside JSON."
90
- )
91
-
92
- USER = """Project: Express.js, 12 route handlers, 0 auth functions, 4 data sinks.
93
- Auth coverage: 0 of 12 route handlers protected. 12 unprotected.
94
-
95
- Analyze this function for security vulnerabilities.
96
-
97
- Function: get_user
98
- File: app/api/users.py:1-3
99
- Role: ROUTE_HANDLER
100
- Auth: NONE -- no auth decorator or middleware protects this endpoint
101
- Calls: db.execute
102
-
103
- Code:
104
- ```
105
- def get_user(user_id):
106
- return db.execute(f"SELECT * FROM users WHERE id='{user_id}'")
107
- ```
108
-
109
- Respond with ONLY a valid JSON object:
110
- {"findings": [{"severity": "HIGH|MEDIUM|LOW", "category": "...", "line": integer, "code": "...", "description": "...", "recommendation": "...", "confidence": 0.0-1.0, "cwe_id": "CWE-xxx"}]}
111
- If no vulnerabilities: {"findings": []}"""
112
-
113
- prompt = tok.apply_chat_template(
114
- [{"role": "system", "content": SYSTEM}, {"role": "user", "content": USER}],
115
- add_generation_prompt=True, tokenize=False,
116
- )
117
- print(generate(model, tok, prompt=prompt, max_tokens=512))
118
  ```
119
 
120
- The prompt format matters. The model was trained with `mask_prompt: true`
121
- on a specific prompt structure. Key fields: `Function`, `File`, `Role`,
122
- `Auth`, `Called by`, `Calls`, and a triple-backticked `Code` block.
123
- Deviating from this structure recovers base-model behaviour.
124
-
125
- Full prompt specification: [docs/m3_inference_contract.md](https://github.com/vitorallo/securereview-7b/blob/main/docs/m3_inference_contract.md)
126
 
127
  ## Training
128
 
129
- - **Base**: mlx-community/Qwen2.5-Coder-7B-Instruct-4bit
130
- - **Method**: QLoRA via mlx-lm lora
131
- - **LoRA**: rank 8, scale 1.0, 8 transformer layers
132
- - **Optimizer**: AdamW, lr 1e-4
133
- - **Batch**: 4 effective (1 x 4 grad accumulation), max_seq_length 4096
134
- - **Epochs**: 1 (~2,654 iterations on 10,616 train records)
135
- - **Val loss**: 1.530 -> 0.203
136
- - **Hardware**: Apple Silicon, ~1 hour, 7.4 GB peak memory
137
-
138
- The adapter is deliberately light (rank 8, 1 epoch) to preserve the base
139
- model's security reasoning while teaching the output format and
140
- auth-context awareness.
141
-
142
- ## Training data
143
-
144
- 13,484 records from 11 sources:
145
-
146
- | Source | Records | Description |
147
- |--------|--------:|-------------|
148
- | CVEFixes | 7,846 | Real vulnerable functions from CVE patch commits |
149
- | Synthetic | 2,728 | LLM-generated pairs from 162 security rules |
150
- | Tool-use | 2,700 | Multi-turn tool-calling examples |
151
- | Investigation | 500 | Verdict examples (confirmed/dismissed/uncertain) |
152
- | Vulnapp | 119 | Real functions from 8 vulnerable applications |
153
- | IDOR/auth | 55 | Auth-context-aware IDOR and clean route handler examples |
154
-
155
- All records include `Auth:` context and `Project:` preamble lines
156
- matching the M11 scanner prompt format.
157
-
158
- All permissively licensed. No CC-BY-NC or share-alike data.
159
-
160
- ## Iteration history
161
-
162
- | Version | Test F1 | FPR | Real-code recall | Change |
163
- |---------|--------:|----:|----------------:|--------|
164
- | base | 14% | 70% | -- | No fine-tuning |
165
- | M3 | 61% | 0% | 26% | Anti-memorisation, description templates |
166
- | M4 | 66% | 0% | 0% (prod) | Tool-use format; over-suppressed |
167
- | M5 | 44% | 2.6% | 88% | Lighter adapter, real vulnapp data |
168
- | **M6** | **44%+** | **<3%** | **100%** | Auth-context injection, IDOR training examples |
169
 
170
  ## Links
171
 
172
- - Code + pipeline: [github.com/vitorallo/securereview-7b](https://github.com/vitorallo/securereview-7b)
173
  - License: Apache-2.0
174
 
175
  ## Citation
 
22
 
23
  # securereview-7b-mlx-4bit
24
 
25
+ A 4-bit MLX fine-tune of Qwen2.5-Coder-7B-Instruct for function-level
26
+ security code review. Input: a code function. Output: structured JSON
27
+ findings with severity, category, CWE, line number, and description.
28
+ Runs on Apple Silicon, ~8 GB memory.
29
 
30
+ Trained on 13,484 examples across 9 languages from CVEFixes, synthetic
 
31
  generation, real vulnerable applications, and community rule sets.
32
  All training data is permissively licensed.
33
 
34
  ## Benchmarks
35
 
36
+ Tested against 33 vulnerable functions from 8 deliberately vulnerable
37
+ applications (DVNA, NodeGoat, pygoat, crAPI, DSVW, WebGoat, RailsGoat,
38
+ Juice Shop):
39
 
40
+ | Metric | Base Qwen | securereview-7b |
41
+ |--------|----------:|----------------:|
42
+ | Vulnapp recall | -- | **94% (31/33)** |
43
+ | FPR (clean code) | 70% | **<3%** |
44
+ | F1 (test split) | 14% | **44%** |
45
 
46
+ Detection by category:
47
 
48
  | Category | Recall |
49
  |----------|--------|
50
+ | SQL Injection | 100% |
51
+ | Command Injection | 100% |
52
+ | SSRF | 100% |
53
+ | Path Traversal | 100% |
54
+ | Broken Access Control | 100% |
55
+ | IDOR | 86% |
56
+ | Insecure Deserialization | 100% |
57
+ | Broken Authentication | 100% |
58
+
59
+ ## Quick start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ```python
62
  from mlx_lm import load, generate
63
 
64
  model, tok = load("vitorallo/securereview-7b-mlx-4bit")
 
 
65
  if hasattr(tok, "eos_token_ids") and 151645 not in tok.eos_token_ids:
66
  tok.eos_token_ids.add(151645)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  ```
68
 
69
+ The model expects a structured prompt with `Function`, `File`, `Role`,
70
+ `Auth`, `Code` fields and a JSON format reminder. See
71
+ [docs/m3_inference_contract.md](https://github.com/vitorallo/securereview-7b/blob/main/docs/m3_inference_contract.md)
72
+ for the full prompt specification.
 
 
73
 
74
  ## Training
75
 
76
+ - **Base**: Qwen2.5-Coder-7B-Instruct-4bit
77
+ - **Method**: QLoRA, rank 8, 8 layers, 1 epoch, lr 1e-4
78
+ - **Data**: 13,484 records, 9 languages, multi-rule prompts (2-8 rules per function)
79
+ - **Hardware**: Apple Silicon, ~1 hour
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  ## Links
82
 
83
+ - [Code + pipeline](https://github.com/vitorallo/securereview-7b)
84
  - License: Apache-2.0
85
 
86
  ## Citation
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2b166500f9fb7c022aaf180d0db66c4f69189cd51d15c55ae4a428d533f04911
3
  size 4284346187
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f66cdb775a7537c7837f99701d35ef0d7df7a3aba23cdb64c0306a68b0bc03f
3
  size 4284346187