File size: 16,132 Bytes
105c9ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
---
license: mit
language:
  - en
tags:
  - text-classification
  - ai-text-detection
  - deberta-v3
  - binary-classification
  - nlp
datasets:
  - liamdugan/raid
  - artem9k/ai-text-detection-pile
  - gsingh1-py/train
  - cc_news
  - blog_authorship_corpus
  - webis/tldr-17
  - ChristophSchuhmann/essays-with-instructions
  - HuggingFaceH4/stack-exchange-preferences
  - pile-of-law/pile-of-law
metrics:
  - accuracy
  - f1
  - precision
  - recall
  - roc_auc
pipeline_tag: text-classification
model-index:
  - name: GLYPH
    results:
      - task:
          type: text-classification
          name: AI-Generated Text Detection
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.9885
          - name: F1
            type: f1
            value: 0.9901
          - name: Precision
            type: precision
            value: 0.9851
          - name: Recall
            type: recall
            value: 0.9952
          - name: ROC-AUC
            type: roc_auc
            value: 0.9990
          - name: MCC
            type: mcc
            value: 0.9765
---

# GLYPH β€” High-Accuracy AI Text Detector

GLYPH is a binary text classifier built on [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) that distinguishes human-written text from AI-generated text. It achieves **98.85% accuracy**, **0.999 ROC-AUC**, and **0.990 F1** on a held-out test set spanning 10 human writing domains and 14 AI model families β€” from GPT-2 (1.5B) through GPT-4 (~1T).

The model was trained on ~50K texts covering academic papers, news articles, blog posts, Reddit discussions, legal filings, Wikipedia, student essays, and technical Q&A on the human side, and outputs from 24 distinct AI model configurations across 10 model families on the AI side. It produces well-separated, high-confidence predictions (mean confidence 0.976) and remains accurate even at the strictest decision thresholds.

## Key Results

| Metric | Value |
|---|---|
| **Accuracy** | 98.85% |
| **F1 Score** | 0.9901 |
| **Precision** | 98.51% |
| **Recall** | 99.52% |
| **ROC-AUC** | 0.9990 |
| **Average Precision** | 0.9993 |
| **MCC** | 0.9765 |
| **Human Accuracy** | 97.94% |
| **AI Accuracy** | 99.52% |
| **Mean Confidence** | 0.976 |
| **F1 @ 0.95 threshold** | 0.987 |

All metrics evaluated on a held-out test set of 5,050 texts (2,136 human / 2,914 AI) with no overlap in source texts, split hashes, or temporal leakage with the training set.

## Per-Source Performance

### Human Text Sources

| Source | Domain | n | Accuracy | Confidence |
|---|---|---|---|---|
| PubMed Abstracts | Biomedical research | 300 | **100.0%** | 0.988 |
| Blog / Opinion | Personal blogs | 200 | **100.0%** | 0.987 |
| Reddit Writing | Informal / social | 300 | **100.0%** | 0.985 |
| Wikipedia | Encyclopedic | 500 | **99.8%** | 0.987 |
| CC-News | Journalism | 392 | **99.5%** | 0.981 |
| arXiv Abstracts | Academic / scientific | 444 | **90.8%** | 0.948 |

arXiv abstracts are the hardest category β€” highly formulaic academic prose with structural similarity to AI output. Even so, detection accuracy is 90.8% with 94.8% mean confidence, and the remaining errors are concentrated in a small subset of unusually short or template-heavy abstracts.

### AI Model Families

| Model | Family | Params | n | Accuracy | F1 |
|---|---|---|---|---|---|
| GPT-3.5-Turbo | OpenAI | 175B | 223 | **100.0%** | 1.000 |
| GPT-4 | OpenAI | ~1T | 215 | **100.0%** | 1.000 |
| Llama-2-70B-Chat | Meta | 70B | 191 | **100.0%** | 1.000 |
| MPT-30B | MosaicML | 30B | 211 | **100.0%** | 1.000 |
| MPT-30B-Chat | MosaicML | 30B | 191 | **100.0%** | 1.000 |
| Mistral-7B-Instruct-v0.1 | Mistral AI | 7B | 194 | **100.0%** | 1.000 |
| Mistral-7B-v0.1 | Mistral AI | 7B | 203 | **100.0%** | 1.000 |
| Llama-3.1-8B-Instruct | Meta | 8B | 238 | **99.6%** | 0.998 |
| Phi-3.5-Mini-Instruct | Microsoft | 3.8B | 238 | **99.6%** | 0.998 |
| Command-Chat | Cohere | 52B | 198 | **99.5%** | 0.997 |
| Text-Davinci-002 | OpenAI | 175B | 176 | **99.4%** | 0.997 |
| Llama-3.2-3B-Instruct | Meta | 3B | 238 | **99.2%** | 0.996 |
| GPT-2-XL | OpenAI | 1.5B | 198 | **98.5%** | 0.992 |
| Cohere Command | Cohere | 52B | 200 | **97.5%** | 0.987 |

Detection is robust across four generations of language models (GPT-2 through GPT-4), three access paradigms (open-weight, API-only, and proprietary), and parameter counts spanning three orders of magnitude (1.5B to ~1T).

### Performance by Text Length

| Length Bucket | n | Accuracy | F1 |
|---|---|---|---|
| Very Long (>2000 words) | 103 | **100.0%** | 1.000 |
| Long (500–2000 words) | 862 | **99.9%** | 0.999 |
| Short (50–150 words) | 1,976 | **98.5%** | 0.989 |
| Medium (150–500 words) | 1,634 | **98.8%** | 0.989 |
| Very Short (<50 words) | 475 | **98.1%** | 0.899 |

Performance degrades gracefully with shorter inputs. Even on texts under 50 words β€” where the model has minimal signal β€” accuracy remains above 98%.

### Threshold Sensitivity

The model produces well-calibrated, high-confidence outputs. Performance holds across aggressive decision thresholds:

| P(AI) Threshold | F1 | Precision |
|---|---|---|
| 0.50 (default) | 0.990 | 0.985 |
| 0.60 | 0.991 | 0.987 |
| 0.70 | 0.992 | 0.990 |
| 0.80 | 0.992 | 0.992 |
| 0.90 | 0.991 | 0.993 |
| 0.95 | 0.987 | 0.996 |

At a 0.95 threshold, precision reaches 99.6% with only a 0.3% drop in F1 β€” suitable for high-stakes applications where false accusations of AI usage carry serious consequences.

## Architecture

| Component | Details |
|---|---|
| Base model | `microsoft/deberta-v3-base` (184M parameters) |
| Architecture | DeBERTa-v3 with disentangled attention and enhanced mask decoder |
| Task head | Linear classifier (768 β†’ 2) with 0.15 dropout |
| Tokenizer | SentencePiece (slow tokenizer, `use_fast=False`) |
| Max sequence length | 512 tokens |
| Output | `[P(human), P(AI)]` softmax probabilities |

DeBERTa-v3 was chosen over RoBERTa and BERT alternatives due to its disentangled attention mechanism, which separately encodes content and position. This is particularly relevant for AI text detection: language models have characteristic positional dependencies in how they distribute tokens across a sequence, and disentangled attention gives the classifier direct access to these patterns.

## Training

### Configuration

| Parameter | Value |
|---|---|
| Trainable parameters | 184,423,682 (100% β€” all layers unfrozen) |
| Optimizer | AdamW (weight decay 0.01) |
| Learning rate | 2e-5 (cosine schedule) |
| Warmup | 10% of total steps |
| Effective batch size | 64 (16 Γ— 4 gradient accumulation) |
| Precision | bf16 mixed precision |
| Gradient checkpointing | Enabled (non-reentrant) |
| Label smoothing | 0.05 |
| Class weights | human=1.182, ai=0.867 |
| Epochs | 8 (early-stopped at 3.17) |
| Best checkpoint | Epoch 1.19 (by validation F1) |
| Training time | ~49 minutes on RTX 4070 Ti 12GB |
| Final train loss | 0.186 |
| Final eval loss | 0.150 |

### Why Fully Unfrozen?

Initial experiments with 4 frozen encoder layers (standard practice from PAN-CLEF 2025 literature) yielded only 80% accuracy with severe human-side bias β€” the model classified 44% of human texts as AI. Freezing 4 of 12 layers in DeBERTa-base locks 33% of the network, far more aggressive than the 21% reported for DeBERTa-large. Unfreezing all layers with cosine LR decay and 10% warmup resolved the bias entirely, lifting human accuracy from 55.6% to 97.9% without sacrificing AI detection (97.4% β†’ 99.5%).

### Dataset Composition

**Total: 50,458 texts** (40,364 train / 5,044 validation / 5,050 test)

Stratified by source with hash-based deduplication to prevent data leakage.

#### Human Sources (10 domains, ~29K target)

| Domain | Source | Target Count | Text Type |
|---|---|---|---|
| Academic (STEM) | arXiv API | 5,000 | Abstracts across 8 categories (cs.CL, cs.AI, cs.LG, physics, math, q-bio, econ, stat) |
| Academic (Medical) | PubMed API | 3,000 | Biomedical research abstracts |
| Encyclopedic | Wikipedia API | 5,000 | Article sections across 10 topic categories |
| Journalism | CC-News (HuggingFace) | 4,000 | News articles |
| Literary / Creative | Project Gutenberg | 2,000 | Public domain book excerpts |
| Informal / Social | Reddit (webis/tldr-17) | 3,000 | Writing-focused subreddit posts |
| Student / Educational | PERSUADE corpus | 2,000 | Student essays |
| Technical / Q&A | StackExchange | 2,000 | Technical answers |
| Blog / Opinion | Blog Authorship Corpus | 2,000 | Personal blog posts |
| Legal / Formal | Pile of Law | 1,000 | Legal opinions and case summaries |

#### AI Sources (24 model configurations across 10 families)

**Locally generated via LM Studio (8 models, Q4_K_M quantization):**

| Model | Family | Parameters |
|---|---|---|
| Llama-3.1-8B-Instruct | Meta Llama | 8B |
| Llama-3.2-3B-Instruct | Meta Llama | 3B |
| Mistral-7B-Instruct-v0.3 | Mistral AI | 7B |
| Qwen2.5-7B-Instruct | Alibaba Qwen | 7B |
| Qwen2.5-14B-Instruct | Alibaba Qwen | 14B |
| Gemma-2-9B-Instruct | Google | 9B |
| Phi-3.5-Mini-Instruct | Microsoft | 3.8B |
| DeepSeek-V2-Lite-Chat | DeepSeek | 16B (MoE) |

Local generation used 4 temperature/sampling configurations (default, creative, precise, varied) across 6 prompt strategies (direct, continue, rewrite, expand, style_mimic, question_answer) with a system prompt enforcing natural human-like output β€” no markdown, no meta-commentary, no self-referential AI language.

**HuggingFace datasets (16 additional model families):**

| Dataset | Models Added | Reference |
|---|---|---|
| RAID (ACL 2024) | ChatGPT-3.5, GPT-4, GPT-3-Davinci, Cohere Command, Llama-2-70B-Chat, Mistral-7B-v0.1, Mixtral-8x7B, MPT-30B, GPT-2-XL | [liamdugan/raid](https://huggingface.co/datasets/liamdugan/raid) |
| AI Text Detection Pile | GPT-2/3/J/ChatGPT (mixed) | [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) |
| NYT Multi-Model | GPT-4o, Yi-Large, Qwen-2-72B, Llama-3-8B, Gemma-2-9B, Mistral-7B | [gsingh1-py/train](https://huggingface.co/datasets/gsingh1-py/train) |

This combination ensures coverage of proprietary API models (GPT-3.5, GPT-4, GPT-4o, Cohere), large open models exceeding consumer GPU VRAM (Llama-2-70B, Qwen-2-72B, Mixtral-8x7B, Yi-Large), older architectures (GPT-2, GPT-3, GPT-J), and mixture-of-experts models (Mixtral, DeepSeek-V2-Lite). RAID data was filtered to non-adversarial generations only (`attack=="none"`) for training data quality.

## Usage

### With Transformers

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "ogmatrixllm/glyph"  # Replace with your repo path
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

text = "Your text to classify here..."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)
    
p_human, p_ai = probs[0].tolist()
label = "AI-generated" if p_ai > 0.5 else "Human-written"
confidence = max(p_human, p_ai)

print(f"{label} (confidence: {confidence:.1%})")
```

### With Pipeline

```python
from transformers import pipeline

detector = pipeline(
    "text-classification",
    model="ogmatrixai/glyph",  # Replace with your repo path
    tokenizer=AutoTokenizer.from_pretrained("ogmatrixai/glyph", use_fast=False),
)

result = detector("Your text here...")
print(result)
# [{'label': 'LABEL_1', 'score': 0.98}]  # LABEL_0 = human, LABEL_1 = AI
```

### Important Notes

- **Tokenizer**: Always use `use_fast=False`. The fast tokenizer for DeBERTa-v3 has a confirmed regression in `transformers>=4.47` ([#42583](https://github.com/huggingface/transformers/issues/42583)) that crashes on load.
- **Max length**: The model was trained with `max_length=512`. Longer texts should be truncated or chunked with predictions aggregated.
- **Labels**: `LABEL_0` = human, `LABEL_1` = AI-generated.

## Limitations and Ethical Considerations

### Known Limitations

1. **English only.** GLYPH was trained exclusively on English text. Performance on other languages is untested and likely degraded.

2. **Training distribution.** The model has seen outputs from 24 specific AI model configurations. Novel architectures, heavily fine-tuned models, or future model families may evade detection. AI text detection is fundamentally adversarial β€” no static detector provides permanent robustness.

3. **arXiv abstracts remain the hardest domain** at 90.8% accuracy. Highly formulaic academic writing with rigid structural conventions shares surface features with AI-generated text. Users in academic integrity contexts should treat borderline predictions on scientific abstracts with appropriate caution.

4. **Short texts (<50 words)** have reduced F1 (0.899) despite high accuracy (98.1%). With minimal token-level signal, the model occasionally produces confident but incorrect predictions. For short-form content, consider requiring higher confidence thresholds.

5. **Adversarial attacks.** The training data includes only non-adversarial AI outputs. Paraphrasing attacks, homoglyph substitution, targeted prompt engineering, and watermark-removal techniques were not included. Dedicated adversarial robustness (e.g., RAID adversarial subsets) is a planned enhancement.

6. **Mixed authorship.** GLYPH classifies at the document level. It does not detect partial AI usage (e.g., AI-written paragraphs embedded in a human-written essay). Sentence-level or span-level detection requires a different approach.

7. **512-token window.** Texts are truncated at 512 tokens. For long documents, this means classification is based on the opening ~350–400 words only. Sliding-window aggregation is recommended for long-form content.

### Ethical Considerations

AI text detection carries real consequences β€” academic penalties, professional reputation damage, content moderation decisions. False positives (human text classified as AI) are particularly harmful. While GLYPH's false positive rate is low (2.06% on the test set, 44 out of 2,136 human texts), no detector achieves zero false positives.

**Recommendations for responsible deployment:**

- Never use GLYPH as the sole basis for punitive action. Use it as one signal among many (metadata, behavioral patterns, stylometric analysis).
- Apply a high confidence threshold (β‰₯0.95) for consequential decisions. At this threshold, precision reaches 99.6%.
- Provide users with the confidence score, not just a binary label. A text scored at P(AI)=0.52 is fundamentally different from one scored at P(AI)=0.99.
- Maintain an appeals process. Statistical classifiers will always produce errors.
- Acknowledge the base rate problem. In populations where AI usage is rare, even a 2% FPR produces many false accusations relative to true detections.

## Training Infrastructure

| Component | Specification |
|---|---|
| GPU | NVIDIA GeForce RTX 4070 Ti (12GB VRAM) |
| CPU | Intel Core i7-14700K (20 cores) |
| RAM | 48GB DDR5 |
| Framework | PyTorch 2.6+ / HuggingFace Transformers |
| Precision | bf16 mixed precision |
| Total training time | 49 minutes |
| Experiment tracking | Weights & Biases |

## Citation

```bibtex
@misc{glyph2026,
  title={GLYPH: High-Accuracy AI Text Detection with DeBERTa-v3},
  author={OGMatrix},
  year={2026},
  url={https://huggingface.co/ogmatrixllm/glyph}
}
```

## Acknowledgments

Training data incorporates the [RAID benchmark](https://huggingface.co/datasets/liamdugan/raid) (Dugan et al., ACL 2024), the [AI Text Detection Pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile), and the [NYT Multi-Model dataset](https://huggingface.co/datasets/gsingh1-py/train). Human text sources include arXiv, PubMed, Wikipedia, CC-News, Project Gutenberg, Reddit, StackExchange, Blog Authorship Corpus, PERSUADE, and Pile of Law. The base model is [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) by Microsoft Research.