ood-editguard-qwen3-1.7b — OOD AI-edit detector

Detect AI-edited text with an out-of-distribution detector on a Qwen3-1.7B backbone. Human text is modeled as the in-distribution; AI-edited and AI-generated text are flagged as outliers, giving a continuous "how-AI-edited" score.

Performance

Validation on pangram/editlens_iclr (held-out, 2400 rows):

Metric	Value
AUROC (AI vs human)	0.955
AUPR	0.977
correlation with edit-magnitude	+0.723
mean score — AI	3.194
mean score — human	0.044

A random detector scores AUROC 0.5. The 1.7B model improves over the 0.6B version (AUROC 0.941→0.955, AUPR 0.969→0.977, correlation +0.661→+0.723).

Usage

import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

device = "cuda"
model_name = "reneeice/ood-editguard-qwen3-1.7b"
base = "Qwen/Qwen3-1.7B-Base"

tok = AutoTokenizer.from_pretrained(model_name, use_fast=True)
backbone = PeftModel.from_pretrained(
    AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16).to(device),
    model_name
).eval()

head = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/reneeice/ood-editguard-qwen3-1.7b/resolve/main/ood_head.pt",
    map_location="cpu"
)

hidden = 2048  # Qwen3-1.7B hidden size
proj = nn.Sequential(
    nn.LayerNorm(hidden, dtype=torch.float32),
    nn.Linear(hidden, head["out_dim"], bias=False, dtype=torch.float32),
).to(device)
proj.load_state_dict(head["proj"])
center = head["center"].to(device)
orientation = int(head["orientation"])

def ai_edit_score(texts):
    """Return oriented OOD distance — higher = more AI-edited."""
    enc = tok(texts, truncation=True, max_length=512, padding=True, return_tensors="pt")
    enc = {k: v.to(device) for k, v in enc.items()}
    with torch.no_grad():
        h = backbone(**enc).last_hidden_state
        mask = enc["attention_mask"].unsqueeze(-1).to(h.dtype)
        pooled = (h * mask).sum(1) / mask.sum(1).clamp(min=1)
        z = proj(pooled.float())
        z = F.normalize(z, dim=-1)
        return (orientation * ((z - center) ** 2).sum(-1)).tolist()

print(ai_edit_score(["A human-written sentence.", "This was entirely generated by an AI language model."]))

Higher score = more AI-edited. Calibrate a threshold on your own data.

How it was trained

Backbone: Qwen/Qwen3-1.7B-Base, bf16 + LoRA (rank 8, all attn+MLP projections).
Head: a small LayerNorm+Linear projection trained in full, with a DeepSVDD one-class objective: pull human embeddings toward a center c, push AI embeddings away. Score = oriented squared distance to c.
Data: 4,000 rows from pangram/editlens_iclr (1 epoch).
Supervision: edit-magnitude buckets from cosine_score (thresholds 0.03/0.15).
Compute: single NVIDIA A40, ~10 minutes.

The project behind this model

This model is one of a family applying the OOD framing of Human Texts Are Outliers (NeurIPS 2025) to the EditLens continuous AI-edit detection task.

Model	Size	AUROC	Approach
ood-editguard-qwen3-0.6b	0.6B	0.941	Trained OOD head
ood-editguard-qwen3-1.7b ← you are here	1.7B	0.955	Trained OOD head
editlens-ood-adapter-qwen3-0.6b	0.6B	0.688	Frozen-embedding adapter

Limitations

English text; best on inputs of roughly a paragraph or more (very short snippets are noisier).
The score reflects degree of AI editing, not authorship intent or quality.
Can be affected by domain shift — calibrate threshold on data resembling your own.
Like all detectors, not immune to adversarial paraphrasing.

License

Apache-2.0. Built on Qwen/Qwen3-1.7B-Base. The supervision labels derive from the gated pangram/editlens_iclr dataset; please honor its terms.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for reneeice/ood-editguard-qwen3-1.7b

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

(381)

this model

Papers for reneeice/ood-editguard-qwen3-1.7b

Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection

Paper • 2510.08602 • Published Oct 7, 2025

EditLens: Quantifying the Extent of AI Editing in Text

Paper • 2510.03154 • Published Oct 3, 2025