--- license: mit datasets: - rogue-security/prompt-injections-benchmark language: - en metrics: - accuracy base_model: - distilbert/distilbert-base-cased tags: - security - prompt - injection --- # LLM-Defense (english) This is a simple classifier meant to filter out common attack vectors for LLMs. ## Uses The main usecase for this in AI agents. This model is best used as a gate between a outside input (via email, text, etc) and the inner model (Opus, Codex, etc) that actually will run the prompts. This is not a catchall for all of the attacks, but it akin to making sure the doors are locked to your house.