juliendenize commited on
Commit
c4be198
·
verified ·
1 Parent(s): 50e443d

Fixing Transformers config

Browse files

The Yarn scaling is not applied when mscale_all_dim is set to 1. This means that rope values were incorrect leading to worse performanc especially for long context. This should fix the issue.

Files changed (1) hide show
  1. config.json +1 -1
config.json CHANGED
@@ -41,7 +41,7 @@
41
  "factor": 64.0,
42
  "llama_4_scaling_beta": 0,
43
  "mscale": 1.0,
44
- "mscale_all_dim": 1.0,
45
  "original_max_position_embeddings": 4096,
46
  "rope_theta": 1000000.0,
47
  "rope_type": "yarn",
 
41
  "factor": 64.0,
42
  "llama_4_scaling_beta": 0,
43
  "mscale": 1.0,
44
+ "mscale_all_dim": 0.0,
45
  "original_max_position_embeddings": 4096,
46
  "rope_theta": 1000000.0,
47
  "rope_type": "yarn",