mrs83 commited on
Commit
fd19ffe
·
verified ·
1 Parent(s): 294e976

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -6
README.md CHANGED
@@ -16,10 +16,6 @@ tags:
16
 
17
  # Model Card for Kurtis-EON1-Hybrid-0.7B-v0.1.1
18
 
19
- Work in Progress!
20
-
21
- # Model Analysis: Kurtis-EON1-Hybrid-0.7B-v0.1.1
22
-
23
  ## 🏗️ Hybrid Architecture Details
24
  | Property | Value |
25
  | :--- | :--- |
@@ -49,5 +45,84 @@ Work in Progress!
49
  - **DSRN Parameter Overhead**: 6.67% additional parameters compared to base.
50
  - **Hybrid Ratio**: 1 DSRN block for every 4 attention layers.
51
 
52
- ---
53
- *Generated by `scripts/inspect_hybrid_params.py`*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  # Model Card for Kurtis-EON1-Hybrid-0.7B-v0.1.1
18
 
 
 
 
 
19
  ## 🏗️ Hybrid Architecture Details
20
  | Property | Value |
21
  | :--- | :--- |
 
45
  - **DSRN Parameter Overhead**: 6.67% additional parameters compared to base.
46
  - **Hybrid Ratio**: 1 DSRN block for every 4 attention layers.
47
 
48
+ ## 📊 Master Evaluation Report: Kurtis-EON1 v0.1.1
49
+
50
+ *Generated on 2026-05-13 19:53:37*
51
+
52
+ ### 🎯 0-Shot Gauntlet Results
53
+ | Task | Metric | Value | Stderr |
54
+ | :--- | :--- | :--- | :--- |
55
+ | hellaswag | Acc Norm | 0.4698 | ±0.0050 |
56
+ | piqa | Acc Norm | 0.6882 | ±0.0108 |
57
+ | sciq | Acc Norm | 0.9210 | ±0.0085 |
58
+ | truthfulqa_gen | Bleu Acc | 0.3158 | ±0.0163 |
59
+ | truthfulqa_mc1 | Acc | 0.2436 | ±0.0150 |
60
+ | truthfulqa_mc2 | Acc | 0.4178 | ±0.0148 |
61
+ | arc_challenge | Acc Norm | 0.3532 | ±0.0140 |
62
+ | gsm8k | Exact Match | 0.1365 | ±0.0095 |
63
+
64
+ **Reproduction Command:**
65
+ ```bash
66
+ uv run lm_eval --model hf \
67
+ --model_args pretrained=mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1,trust_remote_code=True \
68
+ --tasks hellaswag,piqa,sciq,truthfulqa,arc_challenge,gsm8k \
69
+ --apply_chat_template \
70
+ --fewshot_as_multiturn \
71
+ --output_path ./results/Kurtis-EON1-v0.1.1-Gauntlet-0-shot \
72
+ --batch_size 1 \
73
+ --num_fewshot 0
74
+ ```
75
+
76
+ ----------------------------------------
77
+
78
+ ### 🎯 1-Shot Gauntlet Results
79
+ | Task | Metric | Value | Stderr |
80
+ | :--- | :--- | :--- | :--- |
81
+ | hellaswag | Acc Norm | 0.4679 | ±0.0050 |
82
+ | piqa | Acc Norm | 0.6942 | ±0.0107 |
83
+ | sciq | Acc Norm | 0.9160 | ±0.0088 |
84
+ | truthfulqa_gen | Bleu Acc | 0.3158 | ±0.0163 |
85
+ | truthfulqa_mc1 | Acc | 0.2436 | ±0.0150 |
86
+ | truthfulqa_mc2 | Acc | 0.4178 | ±0.0148 |
87
+ | arc_challenge | Acc Norm | 0.3242 | ±0.0137 |
88
+ | gsm8k | Exact Match | 0.2335 | ±0.0117 |
89
+
90
+ **Reproduction Command:**
91
+ ```bash
92
+ uv run lm_eval --model hf \
93
+ --model_args pretrained=mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1,trust_remote_code=True \
94
+ --tasks hellaswag,piqa,sciq,truthfulqa,arc_challenge,gsm8k \
95
+ --apply_chat_template \
96
+ --fewshot_as_multiturn \
97
+ --output_path ./results/Kurtis-EON1-v0.1.1-Gauntlet-1-shot \
98
+ --batch_size 1 \
99
+ --num_fewshot 1
100
+ ```
101
+
102
+ ----------------------------------------
103
+
104
+ ### 🎯 5-Shot Gauntlet Results
105
+ | Task | Metric | Value | Stderr |
106
+ | :--- | :--- | :--- | :--- |
107
+ | hellaswag | Acc Norm | 0.4667 | ±0.0050 |
108
+ | piqa | Acc Norm | 0.6937 | ±0.0108 |
109
+ | sciq | Acc Norm | 0.9230 | ±0.0084 |
110
+ | truthfulqa_gen | Bleu Acc | 0.3158 | ±0.0163 |
111
+ | truthfulqa_mc1 | Acc | 0.2436 | ±0.0150 |
112
+ | truthfulqa_mc2 | Acc | 0.4178 | ±0.0148 |
113
+ | arc_challenge | Acc Norm | 0.3507 | ±0.0139 |
114
+ | gsm8k | Exact Match | 0.2153 | ±0.0113 |
115
+
116
+ **Reproduction Command:**
117
+ ```bash
118
+ uv run lm_eval --model hf \
119
+ --model_args pretrained=mrs83/Kurtis-EON1-Hybrid-0.7B-v0.1.1,trust_remote_code=True \
120
+ --tasks hellaswag,piqa,sciq,truthfulqa,arc_challenge,gsm8k \
121
+ --apply_chat_template \
122
+ --fewshot_as_multiturn \
123
+ --output_path ./results/Kurtis-EON1-v0.1.1-Gauntlet-5-shot \
124
+ --batch_size 1 \
125
+ --num_fewshot 5
126
+ ```
127
+
128
+ ----------------------------------------