---
language: py
tags:
- vulnerability-detection
- code-security
- codebert
- python
- CWE-89
- CWE-78
- CWE-79
- CWE-352
- CWE-94
- CWE-22
- CWE-601
datasets:
- vudenc
license: cc-by-4.0
---

# PyGuard V4 — Python Vulnerability Detector

## Model Description
PyGuard V4 is a fine-tuned Microsoft CodeBERT model for detecting
security vulnerabilities in Python code. It improves upon VUDENC
(Wartschinski et al. 2022) by replacing Word2Vec+LSTM with CodeBERT.

## Performance vs VUDENC

| Metric    | VUDENC (LSTM) | PyGuard V2 (CodeBERT) | Improvement |
|-----------|--------------|----------------------|-------------|
| Precision | 82-96%       | 100.00%              | +4-18%      |
| Recall    | 78-87%       | 100.00%              | +13-22%     |
| F1 Score  | 80-90%       | 100.00%              | +10-20%     |
| Accuracy  | N/A          | 100.00%              | —           |

## Training Dataset
- **Source:** VUDENC Dataset by Wartschinski et al. 2022
- **DOI:** 10.5281/zenodo.3559841
- **Paper:** Information and Software Technology Journal, 2022
- **Total samples:** 2,457 (1,228 vulnerable + 1,229 safe)
- **Split:** 80% train, 10% val, 10% test

## Vulnerabilities Detected (7 CWEs)
- CWE-89: SQL Injection
- CWE-78: Command Injection
- CWE-79: Cross-Site Scripting (XSS)
- CWE-352: Cross-Site Request Forgery (CSRF)
- CWE-94: Remote Code Execution
- CWE-22: Path Disclosure/Traversal
- CWE-601: Open Redirect

## Architecture
- Base model: microsoft/codebert-base
- Classification head: Linear(768, 2) with Dropout(0.3)
- Pooling: Mean pooling on last hidden state
- Max sequence length: 256 tokens

## Citation
```bibtex
@article{wartschinski2022vudenc,
  title={VUDENC: Vulnerability Detection with Deep Learning
         on a Natural Codebase for Python},
  author={Wartschinski, Laura and Noller, Yannic and
          Vogel, Thomas and Kehrer, Timo and Grunske, Lars},
  journal={Information and Software Technology},
  volume={144},
  year={2022}
}
```