Instructions to use reneeice/ood-editguard-qwen3-0.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reneeice/ood-editguard-qwen3-0.6b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="reneeice/ood-editguard-qwen3-0.6b")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("reneeice/ood-editguard-qwen3-0.6b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("reneeice/ood-editguard-qwen3-0.6b", dtype="auto")ood-editguard-qwen3 β OOD AI-edit detector (Qwen3)
Detect AI-edited text with an out-of-distribution detector on a Qwen3 backbone. Human text is modeled as the in-distribution; AI-edited and AI-generated text are flagged as outliers, giving a continuous "how-AI-edited" score.
Usage
import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
base = "Qwen/Qwen3-1.7B-Base"
tok = AutoTokenizer.from_pretrained("reneeice/ood-editguard-qwen3-0.6b")
backbone = PeftModel.from_pretrained(AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16),
"reneeice/ood-editguard-qwen3-0.6b")
head = torch.load("ood_head.pt") # downloaded from the repo
# score(text) = orientation * ||proj(meanpool(backbone(text))) - center||^2
Higher score = more AI-edited. Calibrate a threshold on your own data.
Performance
Validation on pangram/editlens_iclr (held-out):
| Metric | Value |
|---|---|
| AUROC (AI vs human) | 0.910 |
| AUPR | 0.952 |
| correlation with edit-magnitude | +0.730 |
A random detector scores AUROC 0.5.
The project behind this model
This model is one of a family of three, the end of a single research thread that started from a classic question β can you tell human text from machine text? β and ended at a more realistic one β how much did AI edit this text, and can we trust that judgement?
The journey, start to finish:
Reproduce "Human Texts Are Outliers." We first reproduced the core claim of arXiv:2510.08602 (NeurIPS 2025): instead of training a binary human-vs-machine classifier, model machine text as the in-distribution and treat human text as out-of-distribution (OOD) β an anomaly to be detected by distance from a learned center (DeepSVDD). A minimal end-to-end run on the RAID dataset hit AUROC 0.94, matching the paper.
Meet EditLens. Binary detection is the wrong frame for the common case: people lightly edit their own drafts with AI. EditLens (Thai et al., 2025) reframes detection as a continuous "extent of AI editing" score in [0,1], and the community community
editlens-qwen3-*-repromodels (search HF:editlens qwen3 repro) models bring it to a modern Qwen3 backbone.Apply the OOD idea to the edit-detection setting. The insight of this work: take the OOD framing from step 1 and apply it to the edit-detection problem of step 2, on Qwen3. We pursued three concrete ways to do that β and shipped all three as a family:
| Model | What it is | Use it when |
|---|---|---|
ood-editguard-qwen3-0.6b β you are here |
Standalone OOD AI-edit detector β a Qwen3 backbone fine-tuned (QLoRA) with an out-of-distribution head; outputs a continuous "how AI-edited" score. | You want one self-contained model that scores text end-to-end. |
editlens-ood-adapter-qwen3-0.6b |
Tiny OOD adapter (a few MB) that snaps onto a frozen EditLens-Qwen3 (search HF: editlens qwen3 repro) checkpoint to add an anomaly / human-likeness score β no backbone training. |
You already run EditLens and want to add an OOD score cheaply. |
editlens-ood-selective-guard-qwen3 |
Reliability guard for selective prediction β an OOD gate that abstains on inputs unlike the training distribution so the edit-score isn't trusted blindly. | You need calibrated, low-false-positive decisions and can abstain on hard cases. |
Why three? They trade off cost and integration: A is a standalone model, B is a cheap add-on to an existing EditLens deployment, and C wraps either with an abstain-on-uncertainty safety layer. Pick the one that matches how you deploy.
One thing we learned the hard way
Our first frozen-embedding run scored an AUROC of 0.32 β not random, but inverted. On the EditLens embedding space the geometry is the opposite of the original RAID setup: human/clean text is the compact in-distribution and heavily-AI-edited text is the outlier (its embeddings are organized around extent of editing, not authorship). We flipped the in-distribution definition, switched from full Mahalanobis to a shrinkage-regularized / Euclidean distance on frozen features, and added an auto-orientation step that fixes the score's sign on a held-out slice so a detector is never reported upside-down. That correction is baked into this family.
How it was trained
- Backbone:
Qwen/Qwen3-1.7B-Base, bf16 + LoRA (rank 8, all attn+MLP projections). - Head: a small LayerNorm+Linear projection trained in full, with a DeepSVDD
one-class objective: pull human embeddings toward a center
c, push AI embeddings away. Score = oriented squared distance toc. - Supervision: edit-magnitude buckets from
cosine_score(thresholds 0.03/0.15). - Compute: a single GPU, minutes.
License
Apache-2.0. Built on Qwen/Qwen3-*-Base. The supervision labels derive from the
gated pangram/editlens_iclr
dataset; please honor its terms. Method credit: Human Texts Are Outliers
(2510.08602) and EditLens
(2510.03154).
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="reneeice/ood-editguard-qwen3-0.6b")