Update README.md
Browse files
README.md
CHANGED
|
@@ -27,6 +27,25 @@ This model is part of the **Sweelol AI Hub** collection, resulting from experime
|
|
| 27 |
|
| 28 |
This is a placeholder README. A detailed model card with full results and usage instructions will be added shortly.
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Evaluation
|
| 32 |
|
|
|
|
| 27 |
|
| 28 |
This is a placeholder README. A detailed model card with full results and usage instructions will be added shortly.
|
| 29 |
|
| 30 |
+
## Evaluation Results
|
| 31 |
+
|
| 32 |
+
This table compares the performance of this **Finetuned-Pruned** model against the original, un-tuned `google/gemma-3-270m` base model.
|
| 33 |
+
|
| 34 |
+
| Benchmark Task | Sweelol Finetuned-Pruned | Baseline (Gemma-3-270m) | Change |
|
| 35 |
+
| :--- | :--- | :--- | :--- |
|
| 36 |
+
| **Average MMLU (5 tasks)** | 25.18% | 24.88% | **+0.30%** |
|
| 37 |
+
| HellaSwag (Common Sense) | 29.50% | 43.50% | -14.00% |
|
| 38 |
+
| ---------------------------------- | ---------- | ---------- | -------- |
|
| 39 |
+
| *MMLU Sub-task Breakdown:* | | | |
|
| 40 |
+
| MMLU - Formal Logic | **28.57%** | 25.40% | **+3.17%** |
|
| 41 |
+
| MMLU - High School Computer Science | **25.00%** | 24.00% | **+1.00%** |
|
| 42 |
+
| MMLU - Professional Law | 25.00% | 27.00% | -2.00% |
|
| 43 |
+
| MMLU - Abstract Algebra | 22.00% | 22.00% | 0.00% |
|
| 44 |
+
| MMLU - High School Mathematics | 21.00% | 26.00% | -5.00% |
|
| 45 |
+
|
| 46 |
+
#### Summary of Findings
|
| 47 |
+
Fine-tuning the pruned model resulted in a solid overall improvement on MMLU, particularly in formal logic. However, like the pruned-only baseline, it suffered a significant drop in common-sense reasoning (HellaSwag).
|
| 48 |
+
|
| 49 |
|
| 50 |
## Evaluation
|
| 51 |
|