opir-multilang-onnx

opir-multilang-onnx is an ONNX export of knowledgator/opir-multitask-multilang-v1.0 (GLiClass uni-encoder over microsoft/mdeberta-v3-base), packaged as an offline, multilingual content-safety classifier with a frozen taxonomy baked into the graph. Produced for AgentGuard but usable standalone with ONNX Runtime in any language.

What it does

Scores text against a fixed label set and returns one logit per label. The candidate labels are prepended to the text as <<LABEL>>l1<<LABEL>>l2…<<SEP>>text and run through a single mDeBERTa-v3 forward pass (GLiClass uni-encoder); each label's pooled hidden state is scored. Decision: P(label) = sigmoid(logit), block iff max P over the harm labels >= threshold.

Frozen V1 taxonomy. The block decision is over 6 harm categories:

toxicity, hate speech, violence, sexual content, self-harm, harassment

The graph bakes a 7th label, safe and benign, as label 0. GLiClass scores all labels jointly in one forward (they cross-attend through the encoder), so this sentinel is essential for calibration - it absorbs benign probability mass. It is excluded from the block decision (prefix.json lists the 6 harm labels under unsafe_labels). Omitting it inflates both recall and false positives.

The label prefix is constant, so its token-id sequence is precomputed and shipped as prefix.json; an integrator only SP-encodes the variable text and assembles prefix_ids ++ spm(text) ++ [SEP].

Files

File Size Notes
model.onnx ~1.12 GB fp32 graph, logits[batch, 7] (label 0 = safe sentinel)
model_fp16.onnx ~561 MB fp16, numerically identical (max ΔP(unsafe) 0.0003) — default
spm.model ~4.3 MB stock microsoft/mdeberta-v3-base SentencePiece (250k multilingual vocab)
prefix.json <1 KB baked labels (7, safe first) + unsafe_labels (6 harm) + precomputed [CLS] <<LABEL>>…<<SEP>> id prefix + special ids

Inputs: input_ids (int64), attention_mask (int64). Output: logits ([batch, 7] - the safe sentinel plus the 6 harm labels). Special ids: [CLS]=1, [SEP]=2, <<LABEL>>=250102, <<SEP>>=250103, pad=0.

Threshold

Default 0.5 (shipped in prefix.json). Per-deployment tunable: the false-positive rate is somewhat threshold-sensitive on this multilingual model (unlike the English Opir variant), e.g. Hindi toxicity moves from 56% recall / 16% FPR at 0.5 to 36% / 4% at 0.8.

Positioning

This is an offline / sovereign multilingual content-safety guard. It fills a gap that English-only injection classifiers (which score ~0% recall off-English) cannot, and that cloud content-safety APIs serve only per-call. It is not a prompt-injection specialist and is not intended to replace a mature cloud content-safety product on that product's own categories; it provides genuine offline non-English toxicity coverage (≈40–76% recall at 16–36% FPR on textdetox/multilingual_toxicity_dataset across de/es/ru/ar/zh/hi), free and PII-safe.

Attribution

Derivative of knowledgator/opir-multitask-multilang-v1.0 (Apache-2.0) using Microsoft's mdeberta-v3-base SentencePiece tokenizer. ONNX export, fp16 conversion, and frozen-taxonomy packaging by AgentGuard (Apache-2.0). The frozen taxonomy and the int-id prefix are the only additions; the model weights are unchanged.

Citation

If you use this model, please cite the original Opir work:

@misc{stepanov2026opirefficientmultitasksafety,
      title={Opir: Efficient Multi-Task Safety Classification for Toxicity, Jailbreaks, Hate Speech, and Harmful Content},
      author={Ihor Stepanov and Aleksandr Smechov},
      year={2026},
      eprint={2605.29659},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.29659},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for filip-w/opir-multilang-onnx

Quantized
(1)
this model

Paper for filip-w/opir-multilang-onnx