sweelol commited on
Commit
a2d4bbe
·
verified ·
1 Parent(s): 0daf645

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -27,6 +27,25 @@ This model is part of the **Sweelol AI Hub** collection, resulting from experime
27
 
28
  This is a placeholder README. A detailed model card with full results and usage instructions will be added shortly.
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Evaluation
32
 
 
27
 
28
  This is a placeholder README. A detailed model card with full results and usage instructions will be added shortly.
29
 
30
+ ## Evaluation Results
31
+
32
+ This table compares the performance of this **Finetuned-Pruned** model against the original, un-tuned `google/gemma-3-270m` base model.
33
+
34
+ | Benchmark Task | Sweelol Finetuned-Pruned | Baseline (Gemma-3-270m) | Change |
35
+ | :--- | :--- | :--- | :--- |
36
+ | **Average MMLU (5 tasks)** | 25.18% | 24.88% | **+0.30%** |
37
+ | HellaSwag (Common Sense) | 29.50% | 43.50% | -14.00% |
38
+ | ---------------------------------- | ---------- | ---------- | -------- |
39
+ | *MMLU Sub-task Breakdown:* | | | |
40
+ | MMLU - Formal Logic | **28.57%** | 25.40% | **+3.17%** |
41
+ | MMLU - High School Computer Science | **25.00%** | 24.00% | **+1.00%** |
42
+ | MMLU - Professional Law | 25.00% | 27.00% | -2.00% |
43
+ | MMLU - Abstract Algebra | 22.00% | 22.00% | 0.00% |
44
+ | MMLU - High School Mathematics | 21.00% | 26.00% | -5.00% |
45
+
46
+ #### Summary of Findings
47
+ Fine-tuning the pruned model resulted in a solid overall improvement on MMLU, particularly in formal logic. However, like the pruned-only baseline, it suffered a significant drop in common-sense reasoning (HellaSwag).
48
+
49
 
50
  ## Evaluation
51