File size: 16,132 Bytes
105c9ff | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | ---
license: mit
language:
- en
tags:
- text-classification
- ai-text-detection
- deberta-v3
- binary-classification
- nlp
datasets:
- liamdugan/raid
- artem9k/ai-text-detection-pile
- gsingh1-py/train
- cc_news
- blog_authorship_corpus
- webis/tldr-17
- ChristophSchuhmann/essays-with-instructions
- HuggingFaceH4/stack-exchange-preferences
- pile-of-law/pile-of-law
metrics:
- accuracy
- f1
- precision
- recall
- roc_auc
pipeline_tag: text-classification
model-index:
- name: GLYPH
results:
- task:
type: text-classification
name: AI-Generated Text Detection
metrics:
- name: Accuracy
type: accuracy
value: 0.9885
- name: F1
type: f1
value: 0.9901
- name: Precision
type: precision
value: 0.9851
- name: Recall
type: recall
value: 0.9952
- name: ROC-AUC
type: roc_auc
value: 0.9990
- name: MCC
type: mcc
value: 0.9765
---
# GLYPH β High-Accuracy AI Text Detector
GLYPH is a binary text classifier built on [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) that distinguishes human-written text from AI-generated text. It achieves **98.85% accuracy**, **0.999 ROC-AUC**, and **0.990 F1** on a held-out test set spanning 10 human writing domains and 14 AI model families β from GPT-2 (1.5B) through GPT-4 (~1T).
The model was trained on ~50K texts covering academic papers, news articles, blog posts, Reddit discussions, legal filings, Wikipedia, student essays, and technical Q&A on the human side, and outputs from 24 distinct AI model configurations across 10 model families on the AI side. It produces well-separated, high-confidence predictions (mean confidence 0.976) and remains accurate even at the strictest decision thresholds.
## Key Results
| Metric | Value |
|---|---|
| **Accuracy** | 98.85% |
| **F1 Score** | 0.9901 |
| **Precision** | 98.51% |
| **Recall** | 99.52% |
| **ROC-AUC** | 0.9990 |
| **Average Precision** | 0.9993 |
| **MCC** | 0.9765 |
| **Human Accuracy** | 97.94% |
| **AI Accuracy** | 99.52% |
| **Mean Confidence** | 0.976 |
| **F1 @ 0.95 threshold** | 0.987 |
All metrics evaluated on a held-out test set of 5,050 texts (2,136 human / 2,914 AI) with no overlap in source texts, split hashes, or temporal leakage with the training set.
## Per-Source Performance
### Human Text Sources
| Source | Domain | n | Accuracy | Confidence |
|---|---|---|---|---|
| PubMed Abstracts | Biomedical research | 300 | **100.0%** | 0.988 |
| Blog / Opinion | Personal blogs | 200 | **100.0%** | 0.987 |
| Reddit Writing | Informal / social | 300 | **100.0%** | 0.985 |
| Wikipedia | Encyclopedic | 500 | **99.8%** | 0.987 |
| CC-News | Journalism | 392 | **99.5%** | 0.981 |
| arXiv Abstracts | Academic / scientific | 444 | **90.8%** | 0.948 |
arXiv abstracts are the hardest category β highly formulaic academic prose with structural similarity to AI output. Even so, detection accuracy is 90.8% with 94.8% mean confidence, and the remaining errors are concentrated in a small subset of unusually short or template-heavy abstracts.
### AI Model Families
| Model | Family | Params | n | Accuracy | F1 |
|---|---|---|---|---|---|
| GPT-3.5-Turbo | OpenAI | 175B | 223 | **100.0%** | 1.000 |
| GPT-4 | OpenAI | ~1T | 215 | **100.0%** | 1.000 |
| Llama-2-70B-Chat | Meta | 70B | 191 | **100.0%** | 1.000 |
| MPT-30B | MosaicML | 30B | 211 | **100.0%** | 1.000 |
| MPT-30B-Chat | MosaicML | 30B | 191 | **100.0%** | 1.000 |
| Mistral-7B-Instruct-v0.1 | Mistral AI | 7B | 194 | **100.0%** | 1.000 |
| Mistral-7B-v0.1 | Mistral AI | 7B | 203 | **100.0%** | 1.000 |
| Llama-3.1-8B-Instruct | Meta | 8B | 238 | **99.6%** | 0.998 |
| Phi-3.5-Mini-Instruct | Microsoft | 3.8B | 238 | **99.6%** | 0.998 |
| Command-Chat | Cohere | 52B | 198 | **99.5%** | 0.997 |
| Text-Davinci-002 | OpenAI | 175B | 176 | **99.4%** | 0.997 |
| Llama-3.2-3B-Instruct | Meta | 3B | 238 | **99.2%** | 0.996 |
| GPT-2-XL | OpenAI | 1.5B | 198 | **98.5%** | 0.992 |
| Cohere Command | Cohere | 52B | 200 | **97.5%** | 0.987 |
Detection is robust across four generations of language models (GPT-2 through GPT-4), three access paradigms (open-weight, API-only, and proprietary), and parameter counts spanning three orders of magnitude (1.5B to ~1T).
### Performance by Text Length
| Length Bucket | n | Accuracy | F1 |
|---|---|---|---|
| Very Long (>2000 words) | 103 | **100.0%** | 1.000 |
| Long (500β2000 words) | 862 | **99.9%** | 0.999 |
| Short (50β150 words) | 1,976 | **98.5%** | 0.989 |
| Medium (150β500 words) | 1,634 | **98.8%** | 0.989 |
| Very Short (<50 words) | 475 | **98.1%** | 0.899 |
Performance degrades gracefully with shorter inputs. Even on texts under 50 words β where the model has minimal signal β accuracy remains above 98%.
### Threshold Sensitivity
The model produces well-calibrated, high-confidence outputs. Performance holds across aggressive decision thresholds:
| P(AI) Threshold | F1 | Precision |
|---|---|---|
| 0.50 (default) | 0.990 | 0.985 |
| 0.60 | 0.991 | 0.987 |
| 0.70 | 0.992 | 0.990 |
| 0.80 | 0.992 | 0.992 |
| 0.90 | 0.991 | 0.993 |
| 0.95 | 0.987 | 0.996 |
At a 0.95 threshold, precision reaches 99.6% with only a 0.3% drop in F1 β suitable for high-stakes applications where false accusations of AI usage carry serious consequences.
## Architecture
| Component | Details |
|---|---|
| Base model | `microsoft/deberta-v3-base` (184M parameters) |
| Architecture | DeBERTa-v3 with disentangled attention and enhanced mask decoder |
| Task head | Linear classifier (768 β 2) with 0.15 dropout |
| Tokenizer | SentencePiece (slow tokenizer, `use_fast=False`) |
| Max sequence length | 512 tokens |
| Output | `[P(human), P(AI)]` softmax probabilities |
DeBERTa-v3 was chosen over RoBERTa and BERT alternatives due to its disentangled attention mechanism, which separately encodes content and position. This is particularly relevant for AI text detection: language models have characteristic positional dependencies in how they distribute tokens across a sequence, and disentangled attention gives the classifier direct access to these patterns.
## Training
### Configuration
| Parameter | Value |
|---|---|
| Trainable parameters | 184,423,682 (100% β all layers unfrozen) |
| Optimizer | AdamW (weight decay 0.01) |
| Learning rate | 2e-5 (cosine schedule) |
| Warmup | 10% of total steps |
| Effective batch size | 64 (16 Γ 4 gradient accumulation) |
| Precision | bf16 mixed precision |
| Gradient checkpointing | Enabled (non-reentrant) |
| Label smoothing | 0.05 |
| Class weights | human=1.182, ai=0.867 |
| Epochs | 8 (early-stopped at 3.17) |
| Best checkpoint | Epoch 1.19 (by validation F1) |
| Training time | ~49 minutes on RTX 4070 Ti 12GB |
| Final train loss | 0.186 |
| Final eval loss | 0.150 |
### Why Fully Unfrozen?
Initial experiments with 4 frozen encoder layers (standard practice from PAN-CLEF 2025 literature) yielded only 80% accuracy with severe human-side bias β the model classified 44% of human texts as AI. Freezing 4 of 12 layers in DeBERTa-base locks 33% of the network, far more aggressive than the 21% reported for DeBERTa-large. Unfreezing all layers with cosine LR decay and 10% warmup resolved the bias entirely, lifting human accuracy from 55.6% to 97.9% without sacrificing AI detection (97.4% β 99.5%).
### Dataset Composition
**Total: 50,458 texts** (40,364 train / 5,044 validation / 5,050 test)
Stratified by source with hash-based deduplication to prevent data leakage.
#### Human Sources (10 domains, ~29K target)
| Domain | Source | Target Count | Text Type |
|---|---|---|---|
| Academic (STEM) | arXiv API | 5,000 | Abstracts across 8 categories (cs.CL, cs.AI, cs.LG, physics, math, q-bio, econ, stat) |
| Academic (Medical) | PubMed API | 3,000 | Biomedical research abstracts |
| Encyclopedic | Wikipedia API | 5,000 | Article sections across 10 topic categories |
| Journalism | CC-News (HuggingFace) | 4,000 | News articles |
| Literary / Creative | Project Gutenberg | 2,000 | Public domain book excerpts |
| Informal / Social | Reddit (webis/tldr-17) | 3,000 | Writing-focused subreddit posts |
| Student / Educational | PERSUADE corpus | 2,000 | Student essays |
| Technical / Q&A | StackExchange | 2,000 | Technical answers |
| Blog / Opinion | Blog Authorship Corpus | 2,000 | Personal blog posts |
| Legal / Formal | Pile of Law | 1,000 | Legal opinions and case summaries |
#### AI Sources (24 model configurations across 10 families)
**Locally generated via LM Studio (8 models, Q4_K_M quantization):**
| Model | Family | Parameters |
|---|---|---|
| Llama-3.1-8B-Instruct | Meta Llama | 8B |
| Llama-3.2-3B-Instruct | Meta Llama | 3B |
| Mistral-7B-Instruct-v0.3 | Mistral AI | 7B |
| Qwen2.5-7B-Instruct | Alibaba Qwen | 7B |
| Qwen2.5-14B-Instruct | Alibaba Qwen | 14B |
| Gemma-2-9B-Instruct | Google | 9B |
| Phi-3.5-Mini-Instruct | Microsoft | 3.8B |
| DeepSeek-V2-Lite-Chat | DeepSeek | 16B (MoE) |
Local generation used 4 temperature/sampling configurations (default, creative, precise, varied) across 6 prompt strategies (direct, continue, rewrite, expand, style_mimic, question_answer) with a system prompt enforcing natural human-like output β no markdown, no meta-commentary, no self-referential AI language.
**HuggingFace datasets (16 additional model families):**
| Dataset | Models Added | Reference |
|---|---|---|
| RAID (ACL 2024) | ChatGPT-3.5, GPT-4, GPT-3-Davinci, Cohere Command, Llama-2-70B-Chat, Mistral-7B-v0.1, Mixtral-8x7B, MPT-30B, GPT-2-XL | [liamdugan/raid](https://huggingface.co/datasets/liamdugan/raid) |
| AI Text Detection Pile | GPT-2/3/J/ChatGPT (mixed) | [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) |
| NYT Multi-Model | GPT-4o, Yi-Large, Qwen-2-72B, Llama-3-8B, Gemma-2-9B, Mistral-7B | [gsingh1-py/train](https://huggingface.co/datasets/gsingh1-py/train) |
This combination ensures coverage of proprietary API models (GPT-3.5, GPT-4, GPT-4o, Cohere), large open models exceeding consumer GPU VRAM (Llama-2-70B, Qwen-2-72B, Mixtral-8x7B, Yi-Large), older architectures (GPT-2, GPT-3, GPT-J), and mixture-of-experts models (Mixtral, DeepSeek-V2-Lite). RAID data was filtered to non-adversarial generations only (`attack=="none"`) for training data quality.
## Usage
### With Transformers
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "ogmatrixllm/glyph" # Replace with your repo path
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
text = "Your text to classify here..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
p_human, p_ai = probs[0].tolist()
label = "AI-generated" if p_ai > 0.5 else "Human-written"
confidence = max(p_human, p_ai)
print(f"{label} (confidence: {confidence:.1%})")
```
### With Pipeline
```python
from transformers import pipeline
detector = pipeline(
"text-classification",
model="ogmatrixai/glyph", # Replace with your repo path
tokenizer=AutoTokenizer.from_pretrained("ogmatrixai/glyph", use_fast=False),
)
result = detector("Your text here...")
print(result)
# [{'label': 'LABEL_1', 'score': 0.98}] # LABEL_0 = human, LABEL_1 = AI
```
### Important Notes
- **Tokenizer**: Always use `use_fast=False`. The fast tokenizer for DeBERTa-v3 has a confirmed regression in `transformers>=4.47` ([#42583](https://github.com/huggingface/transformers/issues/42583)) that crashes on load.
- **Max length**: The model was trained with `max_length=512`. Longer texts should be truncated or chunked with predictions aggregated.
- **Labels**: `LABEL_0` = human, `LABEL_1` = AI-generated.
## Limitations and Ethical Considerations
### Known Limitations
1. **English only.** GLYPH was trained exclusively on English text. Performance on other languages is untested and likely degraded.
2. **Training distribution.** The model has seen outputs from 24 specific AI model configurations. Novel architectures, heavily fine-tuned models, or future model families may evade detection. AI text detection is fundamentally adversarial β no static detector provides permanent robustness.
3. **arXiv abstracts remain the hardest domain** at 90.8% accuracy. Highly formulaic academic writing with rigid structural conventions shares surface features with AI-generated text. Users in academic integrity contexts should treat borderline predictions on scientific abstracts with appropriate caution.
4. **Short texts (<50 words)** have reduced F1 (0.899) despite high accuracy (98.1%). With minimal token-level signal, the model occasionally produces confident but incorrect predictions. For short-form content, consider requiring higher confidence thresholds.
5. **Adversarial attacks.** The training data includes only non-adversarial AI outputs. Paraphrasing attacks, homoglyph substitution, targeted prompt engineering, and watermark-removal techniques were not included. Dedicated adversarial robustness (e.g., RAID adversarial subsets) is a planned enhancement.
6. **Mixed authorship.** GLYPH classifies at the document level. It does not detect partial AI usage (e.g., AI-written paragraphs embedded in a human-written essay). Sentence-level or span-level detection requires a different approach.
7. **512-token window.** Texts are truncated at 512 tokens. For long documents, this means classification is based on the opening ~350β400 words only. Sliding-window aggregation is recommended for long-form content.
### Ethical Considerations
AI text detection carries real consequences β academic penalties, professional reputation damage, content moderation decisions. False positives (human text classified as AI) are particularly harmful. While GLYPH's false positive rate is low (2.06% on the test set, 44 out of 2,136 human texts), no detector achieves zero false positives.
**Recommendations for responsible deployment:**
- Never use GLYPH as the sole basis for punitive action. Use it as one signal among many (metadata, behavioral patterns, stylometric analysis).
- Apply a high confidence threshold (β₯0.95) for consequential decisions. At this threshold, precision reaches 99.6%.
- Provide users with the confidence score, not just a binary label. A text scored at P(AI)=0.52 is fundamentally different from one scored at P(AI)=0.99.
- Maintain an appeals process. Statistical classifiers will always produce errors.
- Acknowledge the base rate problem. In populations where AI usage is rare, even a 2% FPR produces many false accusations relative to true detections.
## Training Infrastructure
| Component | Specification |
|---|---|
| GPU | NVIDIA GeForce RTX 4070 Ti (12GB VRAM) |
| CPU | Intel Core i7-14700K (20 cores) |
| RAM | 48GB DDR5 |
| Framework | PyTorch 2.6+ / HuggingFace Transformers |
| Precision | bf16 mixed precision |
| Total training time | 49 minutes |
| Experiment tracking | Weights & Biases |
## Citation
```bibtex
@misc{glyph2026,
title={GLYPH: High-Accuracy AI Text Detection with DeBERTa-v3},
author={OGMatrix},
year={2026},
url={https://huggingface.co/ogmatrixllm/glyph}
}
```
## Acknowledgments
Training data incorporates the [RAID benchmark](https://huggingface.co/datasets/liamdugan/raid) (Dugan et al., ACL 2024), the [AI Text Detection Pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile), and the [NYT Multi-Model dataset](https://huggingface.co/datasets/gsingh1-py/train). Human text sources include arXiv, PubMed, Wikipedia, CC-News, Project Gutenberg, Reddit, StackExchange, Blog Authorship Corpus, PERSUADE, and Pile of Law. The base model is [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) by Microsoft Research.
|