---
license: apache-2.0
language: en
tags:
- ai-detection
- ai-edit-detection
- out-of-distribution
- ood-detection
- content-integrity
- qwen3
- deepsvdd
pipeline_tag: text-classification
arxiv:
- 2510.08602
- 2510.03154
base_model: Qwen/Qwen3-1.7B-Base
---

# ood-editguard-qwen3-1.7b — OOD AI-edit detector

**Detect AI-edited text with an out-of-distribution detector on a Qwen3-1.7B backbone.**
Human text is modeled as the in-distribution; AI-edited and AI-generated text are flagged
as outliers, giving a continuous "how-AI-edited" score.

## Performance

Validation on `pangram/editlens_iclr` (held-out, 2400 rows):

| Metric | Value |
|---|---|
| **AUROC** (AI vs human) | **0.955** |
| AUPR | 0.977 |
| correlation with edit-magnitude | +0.723 |
| mean score — AI | 3.194 |
| mean score — human | 0.044 |

A random detector scores AUROC 0.5. The 1.7B model improves over the 0.6B version (AUROC 0.941→0.955,
AUPR 0.969→0.977, correlation +0.661→+0.723).

## Usage

```python
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

device = "cuda"
model_name = "reneeice/ood-editguard-qwen3-1.7b"
base = "Qwen/Qwen3-1.7B-Base"

tok = AutoTokenizer.from_pretrained(model_name, use_fast=True)
backbone = PeftModel.from_pretrained(
    AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16).to(device),
    model_name
).eval()

head = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/reneeice/ood-editguard-qwen3-1.7b/resolve/main/ood_head.pt",
    map_location="cpu"
)

hidden = 2048  # Qwen3-1.7B hidden size
proj = nn.Sequential(
    nn.LayerNorm(hidden, dtype=torch.float32),
    nn.Linear(hidden, head["out_dim"], bias=False, dtype=torch.float32),
).to(device)
proj.load_state_dict(head["proj"])
center = head["center"].to(device)
orientation = int(head["orientation"])

def ai_edit_score(texts):
    """Return oriented OOD distance — higher = more AI-edited."""
    enc = tok(texts, truncation=True, max_length=512, padding=True, return_tensors="pt")
    enc = {k: v.to(device) for k, v in enc.items()}
    with torch.no_grad():
        h = backbone(**enc).last_hidden_state
        mask = enc["attention_mask"].unsqueeze(-1).to(h.dtype)
        pooled = (h * mask).sum(1) / mask.sum(1).clamp(min=1)
        z = proj(pooled.float())
        z = F.normalize(z, dim=-1)
        return (orientation * ((z - center) ** 2).sum(-1)).tolist()

print(ai_edit_score(["A human-written sentence.", "This was entirely generated by an AI language model."]))
```

Higher score = more AI-edited. Calibrate a threshold on your own data.

## How it was trained

- **Backbone:** `Qwen/Qwen3-1.7B-Base`, bf16 + LoRA (rank 8, all attn+MLP projections).
- **Head:** a small LayerNorm+Linear projection trained in full, with a DeepSVDD
  one-class objective: pull **human** embeddings toward a center `c`, push AI
  embeddings away. Score = oriented squared distance to `c`.
- **Data:** 4,000 rows from `pangram/editlens_iclr` (1 epoch).
- **Supervision:** edit-magnitude buckets from `cosine_score` (thresholds 0.03/0.15).
- **Compute:** single NVIDIA A40, ~10 minutes.

## The project behind this model

This model is one of a **family** applying the OOD framing of [Human Texts Are Outliers](https://arxiv.org/abs/2510.08602)
(NeurIPS 2025) to the [EditLens](https://arxiv.org/abs/2510.03154) continuous AI-edit detection task.

| Model | Size | AUROC | Approach |
|---|---|---|---|
| [ood-editguard-qwen3-0.6b](https://huggingface.co/reneeice/ood-editguard-qwen3-0.6b) | 0.6B | 0.941 | Trained OOD head |
| **ood-editguard-qwen3-1.7b** ← you are here | 1.7B | **0.955** | Trained OOD head |
| [editlens-ood-adapter-qwen3-0.6b](https://huggingface.co/reneeice/editlens-ood-adapter-qwen3-0.6b) | 0.6B | 0.688 | Frozen-embedding adapter |

## Limitations

- English text; best on inputs of roughly a paragraph or more (very short snippets are noisier).
- The score reflects *degree of AI editing*, not authorship intent or quality.
- Can be affected by domain shift — calibrate threshold on data resembling your own.
- Like all detectors, not immune to adversarial paraphrasing.

## License

Apache-2.0. Built on `Qwen/Qwen3-1.7B-Base`. The supervision labels derive from
the gated [`pangram/editlens_iclr`](https://huggingface.co/datasets/pangram/editlens_iclr) dataset;
please honor its terms.