Fixing Transformers config
Browse filesThe Yarn scaling is not applied when mscale_all_dim is set to 1. This means that rope values were incorrect leading to worse performanc especially for long context. This should fix the issue.
- config.json +1 -1
config.json
CHANGED
|
@@ -41,7 +41,7 @@
|
|
| 41 |
"factor": 64.0,
|
| 42 |
"llama_4_scaling_beta": 0,
|
| 43 |
"mscale": 1.0,
|
| 44 |
-
"mscale_all_dim":
|
| 45 |
"original_max_position_embeddings": 4096,
|
| 46 |
"rope_theta": 1000000.0,
|
| 47 |
"rope_type": "yarn",
|
|
|
|
| 41 |
"factor": 64.0,
|
| 42 |
"llama_4_scaling_beta": 0,
|
| 43 |
"mscale": 1.0,
|
| 44 |
+
"mscale_all_dim": 0.0,
|
| 45 |
"original_max_position_embeddings": 4096,
|
| 46 |
"rope_theta": 1000000.0,
|
| 47 |
"rope_type": "yarn",
|