---
license: mit
datasets:
- rogue-security/prompt-injections-benchmark
language:
- en
metrics:
- accuracy
base_model:
- distilbert/distilbert-base-cased
tags:
- security
- prompt
- injection
---
# LLM-Defense (english)

This is a simple classifier meant to filter out common attack vectors for LLMs. 

## Uses

The main usecase for this in AI agents. This model is best used as a gate between a outside input (via email, text, etc) and the
inner model (Opus, Codex, etc) that actually will run the prompts. This is not a catchall for all of the attacks, but it akin to making
sure the doors are locked to your house.