Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
base_model:
|
| 5 |
+
- meta-llama/Meta-Llama-3-8B
|
| 6 |
+
---
|
| 7 |
+
## haoranli-ml/Llama-3-8B-HardClip-64k-Base
|
| 8 |
+
|
| 9 |
+
[](https://arxiv.org/abs/2602.05258)
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
### ✨ Overview
|
| 13 |
+
**CoPE** is a plug-and-play enchancement of RoPE that *softly* clips the unstable low-frequency components, delivering consistent gains both **within the training context** and during **long-context extrapoaltion**.
|
| 14 |
+
|
| 15 |
+
With a simple yet effective soft clipping strategy, CoPE
|
| 16 |
+
|
| 17 |
+
1️⃣ **Eliminates severe OOD outliers**, whose periods exceed the pre-training context window and are the primary cause of OOD extrapolation.
|
| 18 |
+
|
| 19 |
+
2️⃣ **Refines Long-range Semantic Signals** by alleviating the secret *long-term decay of semantic attention* introduced by RoPE.
|
| 20 |
+
|
| 21 |
+
3️⃣ **Prevents Spectral Leakage** induced by hard frequency truncation, which otherwise leads to long-range oscillatory ringing in the attention scores across relative token distances and introduces spurious correlations.
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
### 📖 Citation
|
| 26 |
+
```
|
| 27 |
+
@article{li2026cope,
|
| 28 |
+
title={CoPE: Clipped RoPE as A Scalable Free Lunch for Long Context LLMs},
|
| 29 |
+
author={Li, Haoran and Ren, Sucheng and Yuille, Alan and Wang, Feng},
|
| 30 |
+
journal={arXiv preprint arXiv:2602.05258},
|
| 31 |
+
year={2026}
|
| 32 |
+
}
|
| 33 |
+
```
|