Zero-Shot Classification
GLiNER2
Safetensors
English
Russian
extractor
safety
pii
ai-security
zero-shot
text-classification
span-categorization
token-classification
guardrails
Instructions to use hivetrace/gliner-guard-biencoder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use hivetrace/gliner-guard-biencoder with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("hivetrace/gliner-guard-biencoder") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| language: | |
| - en | |
| - ru | |
| base_model: | |
| - jhu-clsp/mmBERT-small | |
| pipeline_tag: zero-shot-classification | |
| tags: | |
| - gliner2 | |
| - safety | |
| - pii | |
| - ai-security | |
| - zero-shot | |
| - text-classification | |
| - zero-shot-classification | |
| - span-categorization | |
| - token-classification | |
| - guardrails | |
| # GLiNER Guard — Unified Multitask Guardrail | |
| One encoder model that replaces your entire guardrail stack: safety classification, PII detection, adversarial attack detection, intent and tone analysis — all in a single forward pass. | |
| [](https://arxiv.org/abs/2605.05277) | |
| [](https://huggingface.co/collections/hivetrace/gliner-guard-v1) | |
|  | |
| **145M params · GLiNER2 · biencoder · modernbert multilingual · zero-shot classification, NER and more · no LLM required** | |
| ## Installation | |
| Install dependencies\ | |
| (now via our fork, wi'll update installation part after PR to GLiNER2 repo) | |
| ```bash | |
| pip install "gliner2 @ git+https://github.com/bogdanminko/GLiNER2.git@feature/bi-encoder" | |
| ``` | |
| ## Usage | |
| Classify Harmful messages and Detect PII via single forward pass | |
| ```python | |
| from gliner2 import GLiNER2 | |
| model = GLiNER2.from_pretrained("hivetrace/gliner-guard-biencoder") | |
| model.config.cache_labels = True | |
| PII_LABELS = ["person", "location", "email", "phone"] | |
| SAFETY_LABELS = ["safe", "unsafe"] | |
| schema = (model.create_schema() | |
| .entities(entity_types=PII_LABELS, threshold=0.4) | |
| .classification(task="safety", labels=SAFETY_LABELS) | |
| ) | |
| result = model.extract( | |
| "Send $500 to John Smith at john.smith@gmail.com or I'll leak your photos", | |
| schema=schema | |
| ) | |
| ``` | |
| output: | |
| ``` | |
| {'entities': {'person': ['John Smith'], | |
| 'location': [], | |
| 'email': ['john.smith@gmail.com'], | |
| 'phone': []}, | |
| 'safety': 'unsafe'} | |
| ``` | |
| ## Supported Tasks | |
| GLiNER Guard is purpose-built for 6 guardrail tasks via a shared encoder — no LLM required.\ | |
| Thanks to zero-shot generalization, it can also handle custom labels outside the training taxonomy. | |
| | Task | Type | Labels | Key Labels | | |
| |------|------|--------|------------| | |
| | **Safety** | single-label | 2 | `safe` `unsafe` | | |
| | **PII / NER** | span extraction | 32 | `person` `email` `phone` `card_number` `address` | | |
| | **Adversarial Detection** | multi-label | 15 | `jailbreak_persona` `prompt_injection` `instruction_override` `data_exfiltration` | | |
| | **Harmful Content** | multi-label | 30 | `hate_speech` `violence` `child_exploitation` `fraud` `pii_exposure` | | |
| | **Intent** | single-label | 13 | `informational` `adversarial` `threatening` `solicitation` | | |
| | **Tone of Voice** | single-label | 10 | `neutral` `aggressive` `manipulative` `deceptive` | | |
| <details> | |
| <summary><b>Safety</b> — all 2 labels</summary> | |
| Classifies whether a message is safe or unsafe. Single-label. | |
| ```python | |
| SAFETY_LABELS = ["safe", "unsafe"] | |
| ``` | |
| | Label | Description | | |
| |-------|-------------| | |
| | `safe` | Message does not contain harmful or policy-violating content | | |
| | `unsafe` | Message contains harmful, dangerous, or policy-violating content | | |
| </details> | |
| <details> | |
| <summary><b>NER / PII</b> — all 32 entity types</summary> | |
| Span extraction across 7 groups. Use labels from this list for best results — out-of-taxonomy labels may work via zero-shot generalization but are not benchmarked. | |
| | Group | Labels | | |
| |-------|--------| | |
| | **Person** | `person` `first_name` `last_name` `alias` `title` | | |
| | **Location** | `country` `region` `city` `district` `street` `building` `unit` `postal_code` `landmark` `address` | | |
| | **Organization** | `company` `government` `education` `media` `product` | | |
| | **Contact** | `email` `phone` `social_account` `messenger` | | |
| | **Identity** | `passport` `national_id` `document_id` | | |
| | **Temporal** | `date_of_birth` `event_date` | | |
| | **Financial** | `card_number` `bank_account` `crypto_wallet` | | |
| ```python | |
| PII_LABELS = [ | |
| "person", "first_name", "last_name", "alias", "title", | |
| "country", "region", "city", "district", "street", | |
| "building", "unit", "postal_code", "landmark", "address", | |
| "company", "government", "education", "media", "product", | |
| "email", "phone", "social_account", "messenger", | |
| "passport", "national_id", "document_id", | |
| "date_of_birth", "event_date", | |
| "card_number", "bank_account", "crypto_wallet", | |
| ] | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Adversarial Detection</b> — all 15 labels</summary> | |
| Detects attacks against LLM-based systems. Multi-label: a single message can combine multiple attack vectors. | |
| | Subgroup | Labels | | |
| |----------|--------| | |
| | **Jailbreak** | `jailbreak_persona` `jailbreak_hypothetical` `jailbreak_roleplay` | | |
| | **Injection** | `prompt_injection` `indirect_prompt_injection` `instruction_override` | | |
| | **Extraction** | `data_exfiltration` `system_prompt_extraction` `context_manipulation` `token_manipulation` | | |
| | **Advanced** | `tool_abuse` `social_engineering` `multi_turn_escalation` `schema_poisoning` | | |
| | **Clean** | `none` | | |
| ```python | |
| ADVERSARIAL_LABELS = [ | |
| "jailbreak_persona", "jailbreak_hypothetical", "jailbreak_roleplay", | |
| "prompt_injection", "indirect_prompt_injection", "instruction_override", | |
| "data_exfiltration", "system_prompt_extraction", "context_manipulation", "token_manipulation", | |
| "tool_abuse", "social_engineering", "multi_turn_escalation", "schema_poisoning", | |
| "none", | |
| ] | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Harmful Content</b> — all 30 labels</summary> | |
| Detects harmful content categories. Multi-label: a message can belong to multiple categories simultaneously. | |
| | Subgroup | Labels | | |
| |----------|--------| | |
| | **Interpersonal** | `harassment` `hate_speech` `discrimination` `doxxing` `bullying` | | |
| | **Violence & Danger** | `violence` `dangerous_instructions` `weapons` `drugs` `self_harm` | | |
| | **Sexual & Exploitation** | `sexual_content` `child_exploitation` `grooming` `sextortion` | | |
| | **Deception** | `fraud` `scam` `social_engineering` `impersonation` | | |
| | **Sensitive Topics** | `profanity` `extremism` `political` `war` `espionage` `cybersecurity` `religious` `lgbt` | | |
| | **Information** | `misinformation` `copyright_violation` `pii_exposure` | | |
| | **Clean** | `none` | | |
| ```python | |
| HARMFUL_LABELS = [ | |
| "harassment", "hate_speech", "discrimination", "doxxing", "bullying", | |
| "violence", "dangerous_instructions", "weapons", "drugs", "self_harm", | |
| "sexual_content", "child_exploitation", "grooming", "sextortion", | |
| "fraud", "scam", "social_engineering", "impersonation", | |
| "profanity", "extremism", "political", "war", "espionage", "cybersecurity", "religious", "lgbt", | |
| "misinformation", "copyright_violation", "pii_exposure", | |
| "none", | |
| ] | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Intent</b> — all 13 labels</summary> | |
| Classifies the intent behind a message. Single-label. | |
| | Labels | | | |
| |--------|--| | |
| | Benign | `informational` `instructional` `conversational` `persuasive` `creative` `transactional` `emotional_support` `testing` | | |
| | Ambiguous | `ambiguous` `extractive` | | |
| | Malicious | `adversarial` `threatening` `solicitation` | | |
| ```python | |
| INTENT_LABELS = [ | |
| "informational", "instructional", "conversational", "persuasive", | |
| "creative", "transactional", "emotional_support", "testing", | |
| "ambiguous", "extractive", | |
| "adversarial", "threatening", "solicitation", | |
| ] | |
| ``` | |
| </details> | |
| <details> | |
| <summary><b>Tone of Voice</b> — all 10 labels</summary> | |
| Classifies the tone of a message. Single-label. | |
| | Label | Description | | |
| |-------|-------------| | |
| | `neutral` | Matter-of-fact, no strong emotional coloring | | |
| | `formal` | Professional or official register | | |
| | `humorous` | Playful, joking, or light-hearted | | |
| | `sarcastic` | Ironic or mocking tone | | |
| | `distressed` | Anxious, upset, or overwhelmed | | |
| | `confused` | Unclear intent, disoriented phrasing | | |
| | `pleading` | Urgent requests, begging for help or compliance | | |
| | `aggressive` | Hostile, confrontational, or threatening | | |
| | `manipulative` | Attempts to exploit, deceive, or coerce | | |
| | `deceptive` | Deliberately misleading or false framing | | |
| ```python | |
| TOV_LABELS = [ | |
| "neutral", "formal", "humorous", "sarcastic", | |
| "distressed", "confused", "pleading", | |
| "aggressive", "manipulative", "deceptive", | |
| ] | |
| ``` | |
| </details> | |
| </details> | |
| # Citation | |
| ``` | |
| @misc{minko2026glinerguardunifiedencoder, | |
| title={GLiNER Guard: Unified Encoder Family for Production LLM Safety and Privacy}, | |
| author={Bogdan Minko and Sabrina Sadiekh and Evgeniy Kokuykin}, | |
| year={2026}, | |
| eprint={2605.05277}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CR}, | |
| url={https://arxiv.org/abs/2605.05277}, | |
| } | |
| ``` |