ood-editguard-qwen3-1.7b β€” OOD AI-edit detector

Detect AI-edited text with an out-of-distribution detector on a Qwen3-1.7B backbone. Human text is modeled as the in-distribution; AI-edited and AI-generated text are flagged as outliers, giving a continuous "how-AI-edited" score.

Performance

Validation on pangram/editlens_iclr (held-out, 2400 rows):

Metric Value
AUROC (AI vs human) 0.955
AUPR 0.977
correlation with edit-magnitude +0.723
mean score β€” AI 3.194
mean score β€” human 0.044

A random detector scores AUROC 0.5. The 1.7B model improves over the 0.6B version (AUROC 0.941β†’0.955, AUPR 0.969β†’0.977, correlation +0.661β†’+0.723).

Usage

import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

device = "cuda"
model_name = "reneeice/ood-editguard-qwen3-1.7b"
base = "Qwen/Qwen3-1.7B-Base"

tok = AutoTokenizer.from_pretrained(model_name, use_fast=True)
backbone = PeftModel.from_pretrained(
    AutoModel.from_pretrained(base, torch_dtype=torch.bfloat16).to(device),
    model_name
).eval()

head = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/reneeice/ood-editguard-qwen3-1.7b/resolve/main/ood_head.pt",
    map_location="cpu"
)

hidden = 2048  # Qwen3-1.7B hidden size
proj = nn.Sequential(
    nn.LayerNorm(hidden, dtype=torch.float32),
    nn.Linear(hidden, head["out_dim"], bias=False, dtype=torch.float32),
).to(device)
proj.load_state_dict(head["proj"])
center = head["center"].to(device)
orientation = int(head["orientation"])

def ai_edit_score(texts):
    """Return oriented OOD distance β€” higher = more AI-edited."""
    enc = tok(texts, truncation=True, max_length=512, padding=True, return_tensors="pt")
    enc = {k: v.to(device) for k, v in enc.items()}
    with torch.no_grad():
        h = backbone(**enc).last_hidden_state
        mask = enc["attention_mask"].unsqueeze(-1).to(h.dtype)
        pooled = (h * mask).sum(1) / mask.sum(1).clamp(min=1)
        z = proj(pooled.float())
        z = F.normalize(z, dim=-1)
        return (orientation * ((z - center) ** 2).sum(-1)).tolist()

print(ai_edit_score(["A human-written sentence.", "This was entirely generated by an AI language model."]))

Higher score = more AI-edited. Calibrate a threshold on your own data.

How it was trained

  • Backbone: Qwen/Qwen3-1.7B-Base, bf16 + LoRA (rank 8, all attn+MLP projections).
  • Head: a small LayerNorm+Linear projection trained in full, with a DeepSVDD one-class objective: pull human embeddings toward a center c, push AI embeddings away. Score = oriented squared distance to c.
  • Data: 4,000 rows from pangram/editlens_iclr (1 epoch).
  • Supervision: edit-magnitude buckets from cosine_score (thresholds 0.03/0.15).
  • Compute: single NVIDIA A40, ~10 minutes.

The project behind this model

This model is one of a family applying the OOD framing of Human Texts Are Outliers (NeurIPS 2025) to the EditLens continuous AI-edit detection task.

Model Size AUROC Approach
ood-editguard-qwen3-0.6b 0.6B 0.941 Trained OOD head
ood-editguard-qwen3-1.7b ← you are here 1.7B 0.955 Trained OOD head
editlens-ood-adapter-qwen3-0.6b 0.6B 0.688 Frozen-embedding adapter

Limitations

  • English text; best on inputs of roughly a paragraph or more (very short snippets are noisier).
  • The score reflects degree of AI editing, not authorship intent or quality.
  • Can be affected by domain shift β€” calibrate threshold on data resembling your own.
  • Like all detectors, not immune to adversarial paraphrasing.

License

Apache-2.0. Built on Qwen/Qwen3-1.7B-Base. The supervision labels derive from the gated pangram/editlens_iclr dataset; please honor its terms.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for reneeice/ood-editguard-qwen3-1.7b

Finetuned
(381)
this model

Papers for reneeice/ood-editguard-qwen3-1.7b