--- license: apache-2.0 language: en tags: - ai-detection - ai-edit-detection - out-of-distribution - ood-detection - content-integrity - qwen3 - deepsvdd pipeline_tag: text-classification arxiv: - 2510.08602 - 2510.03154 base_model: Qwen/Qwen3-1.7B-Base --- # ood-editguard-qwen3-1.7b — OOD AI-edit detector **Detect AI-edited text with an out-of-distribution detector on a Qwen3-1.7B backbone.** Human text is modeled as the in-distribution; AI-edited and AI-generated text are flagged as outliers, giving a continuous "how-AI-edited" score. ## Performance Validation on `pangram/editlens_iclr` (held-out, 2400 rows): | Metric | Value | |---|---| | **AUROC** (AI vs human) | **0.955** | | AUPR | 0.977 | | correlation with edit-magnitude | +0.723 | | mean score — AI | 3.194 | | mean score — human | 0.044 | A random detector scores AUROC 0.5. The 1.7B model improves over the 0.6B version (AUROC 0.941→0.955, AUPR 0.969→0.977, correlation +0.661→+0.723). ## Usage ```python import torch import torch.nn as nn import torch.nn.functional as F from transformers import AutoTokenizer, AutoModel from peft import PeftModel device = "cuda" model_name = "reneeice/ood-editguard-qwen3-1.7b" base = "Qwen/Qwen3-1.7B-Base" tok = AutoTokenizer.from_pretrained(model_name, use_fast=True) backbone = PeftModel.from_pretrained( AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16).to(device), model_name ).eval() head = torch.hub.load_state_dict_from_url( "https://huggingface.co/reneeice/ood-editguard-qwen3-1.7b/resolve/main/ood_head.pt", map_location="cpu" ) hidden = 2048 # Qwen3-1.7B hidden size proj = nn.Sequential( nn.LayerNorm(hidden, dtype=torch.float32), nn.Linear(hidden, head["out_dim"], bias=False, dtype=torch.float32), ).to(device) proj.load_state_dict(head["proj"]) center = head["center"].to(device) orientation = int(head["orientation"]) def ai_edit_score(texts): """Return oriented OOD distance — higher = more AI-edited.""" enc = tok(texts, truncation=True, max_length=512, padding=True, return_tensors="pt") enc = {k: v.to(device) for k, v in enc.items()} with torch.no_grad(): h = backbone(**enc).last_hidden_state mask = enc["attention_mask"].unsqueeze(-1).to(h.dtype) pooled = (h * mask).sum(1) / mask.sum(1).clamp(min=1) z = proj(pooled.float()) z = F.normalize(z, dim=-1) return (orientation * ((z - center) ** 2).sum(-1)).tolist() print(ai_edit_score(["A human-written sentence.", "This was entirely generated by an AI language model."])) ``` Higher score = more AI-edited. Calibrate a threshold on your own data. ## How it was trained - **Backbone:** `Qwen/Qwen3-1.7B-Base`, bf16 + LoRA (rank 8, all attn+MLP projections). - **Head:** a small LayerNorm+Linear projection trained in full, with a DeepSVDD one-class objective: pull **human** embeddings toward a center `c`, push AI embeddings away. Score = oriented squared distance to `c`. - **Data:** 4,000 rows from `pangram/editlens_iclr` (1 epoch). - **Supervision:** edit-magnitude buckets from `cosine_score` (thresholds 0.03/0.15). - **Compute:** single NVIDIA A40, ~10 minutes. ## The project behind this model This model is one of a **family** applying the OOD framing of [Human Texts Are Outliers](https://arxiv.org/abs/2510.08602) (NeurIPS 2025) to the [EditLens](https://arxiv.org/abs/2510.03154) continuous AI-edit detection task. | Model | Size | AUROC | Approach | |---|---|---|---| | [ood-editguard-qwen3-0.6b](https://huggingface.co/reneeice/ood-editguard-qwen3-0.6b) | 0.6B | 0.941 | Trained OOD head | | **ood-editguard-qwen3-1.7b** ← you are here | 1.7B | **0.955** | Trained OOD head | | [editlens-ood-adapter-qwen3-0.6b](https://huggingface.co/reneeice/editlens-ood-adapter-qwen3-0.6b) | 0.6B | 0.688 | Frozen-embedding adapter | ## Limitations - English text; best on inputs of roughly a paragraph or more (very short snippets are noisier). - The score reflects *degree of AI editing*, not authorship intent or quality. - Can be affected by domain shift — calibrate threshold on data resembling your own. - Like all detectors, not immune to adversarial paraphrasing. ## License Apache-2.0. Built on `Qwen/Qwen3-1.7B-Base`. The supervision labels derive from the gated [`pangram/editlens_iclr`](https://huggingface.co/datasets/pangram/editlens_iclr) dataset; please honor its terms.