---
license: other
tags:
- vulnerability-detection
- security
- code-analysis
- llama
language:
- code
---

# cod4-agent1: meta-llama/Llama-3.1-8B-Instruct fine-tuned su DiverseVul

**Modello base**: `meta-llama/Llama-3.1-8B-Instruct` (licenza ereditata).

**Macro-categoria**: `cod4` -- Gestione degli Errori e Stato

**CWE target**:
- CWE-401 (Memory Leak)
- CWE-476 (NULL Pointer Dereference)
- CWE-703 (Exception Handling)

## Ruolo nell'ensemble

- Agente `A1` del trittico `cod4`.
- Peso nel voto pesato: **0.5**.
- Checkpoint di origine: `best`.

## Formato di output

Una singola riga in italiano:

```
verdetto: VULN, cwe: CWE-XXX
verdetto: SAFE, cwe: N/A
```

## Inferenza consigliata

- Precisione: BF16 + Flash Attention 2.
- Decoding: greedy (`do_sample=False`, `repetition_penalty=1.05`).
- `max_new_tokens=24`, `force_prefix="verdetto:"`.

## Smoke test (post-merge, 20 sample)

| Metrica | Valore |
|---|---|
| Parse success rate | 1.000 |
| Accuracy binaria | 0.750 |
| Recall VULNERABLE | 0.600 |
| Recall SAFE | 0.900 |
| F1 macro | 0.750 |
| CWE accuracy@1 (sui VULN ground truth) | 0.200 |
| Sample falliti (parse o errore) | 0/20 |

Le metriche sono indicative su 20 sample stratificati (10 SAFE + 10 VULN)
estratti deterministicamente (seed=42) dal test set
`data/test/cod4`. Per metriche complete riferirsi
al run di evaluation su test set held-out completo.

## Pipeline di inferenza

Questo modello e' uno dei 12 agenti dell'ensemble multi-modello
`vuln-detect-ensemble`. Viene caricato dal `model_registry` in coppia con
gli altri 2 agenti del trittico cod4 (max 1 trittico residente
in VRAM per volta).