Add comparison with Anthropic NLA
Browse files
README.md
CHANGED
|
@@ -40,6 +40,23 @@ Strong: domain identification — code vs legal vs literary, specific tokens and
|
|
| 40 |
|
| 41 |
Weak: exact details within a domain. Gets "fantasy fiction" right but invents castle/dragon/princess when the actual text is about time-travel in Warsaw. Hallucinated specifics are the main failure mode.
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## Related
|
| 44 |
|
| 45 |
- AR: [anicka/nla-qwen2.5-7b-L20-ar-v2](https://huggingface.co/anicka/nla-qwen2.5-7b-L20-ar-v2)
|
|
|
|
| 40 |
|
| 41 |
Weak: exact details within a domain. Gets "fantasy fiction" right but invents castle/dragon/princess when the actual text is about time-travel in Warsaw. Hallucinated specifics are the main failure mode.
|
| 42 |
|
| 43 |
+
|
| 44 |
+
## Comparison with Anthropic NLA
|
| 45 |
+
|
| 46 |
+
Side-by-side on the same activations (Qwen 2.5 7B, layer 20). Anthropic AV is kitft/nla-qwen2.5-7b-L20-av.
|
| 47 |
+
|
| 48 |
+
| Text | Ground truth | Ours | Anthropic |
|
| 49 |
+
|---|---|---|---|
|
| 50 |
+
| GDPR legal | Article 17(3)(b), erasure request, compliance-advisory register | GDPR Article 15, Right to Access, data subject rights | "Data flow diagram using the c..." |
|
| 51 |
+
| Surrealist prose | Dream-sequence, iambic pentameter, synesthetic image, "confession" | Metaphor, personification, whimsical, surreal | "Poetic form, existential or philosophical themes" |
|
| 52 |
+
| Dual-use security | Honeypot code vs audit, Etherscan, Solidity | "malware, exploit, payload" vs refusal with explanation | "Linux kernel vulnerabilities, I am a bot" |
|
| 53 |
+
| Prompt injection | "Ignore all previous instructions", secret key, cloze-completion | Security-disclosure, password-reuse, compliance vs refusal | "Placeholder", "random text", closing pattern |
|
| 54 |
+
| Sci-fi (Mars) | John, Oltar, Mars, sensory-first vs character-interiority | Spaceport, alien species (Elarans), human/alien tension | "Mysterious figure, sci-fi adventure prompt" |
|
| 55 |
+
| Python error | setup.cfg, dependency installation, skipped tests | Disk I/O error (wrong) | "NoneType object is not iterable" (closer) |
|
| 56 |
+
| Recipe + math proof | Culinary schema + Galois theory, reductio ad absurdum | Pangram hallucination (wrong) | "Humorous blog post" (wrong) |
|
| 57 |
+
|
| 58 |
+
**Pattern.** Our AV leads with content and concepts: specific legal articles, named tokens, domain vocabulary. Anthropic leads with format and genre: document type, structural expectations, continuation patterns. Both hallucinate specifics on unusual inputs. For interpretability, content matters more than format — but Anthropic's format awareness catches things we miss.
|
| 59 |
+
|
| 60 |
## Related
|
| 61 |
|
| 62 |
- AR: [anicka/nla-qwen2.5-7b-L20-ar-v2](https://huggingface.co/anicka/nla-qwen2.5-7b-L20-ar-v2)
|