srv-sngh commited on
Commit
a4a7da5
·
verified ·
1 Parent(s): c10a36a

Card: add agentic HumanEval+ and base MBPP+ partial cells

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -62,10 +62,10 @@ EvalPlus set; GSM8K is a 150‑problem subset, 8‑shot. **Not** EvalPlus‑lead
62
 
63
  | Model | mode | HumanEval | HumanEval+ | MBPP | MBPP+ | GSM8K |
64
  |---|---|---|---|---|---|---|
65
- | Google gemma-4-12B-it (base) | off | 57.3 | 56.7 | 42.1 | | 95.3 |
66
- | Google gemma-4-12B-it (base) | on | 48.8 | 48.8 | 49.5 | | — |
67
- | agentic v2 (this model) | off | 83.5 | | — | — | — |
68
- | agentic v2 (this model) | on | 86.0 | | — | — | — |
69
 
70
  > ⚠️ **Partial results — this run is still in progress.** Empty cells (—) are filling in; the card will be updated as the full sweep (coder + remaining agentic/GSM8K) completes.
71
 
 
62
 
63
  | Model | mode | HumanEval | HumanEval+ | MBPP | MBPP+ | GSM8K |
64
  |---|---|---|---|---|---|---|
65
+ | Google gemma-4-12B-it (base) | off | 57.3 | 56.7 | 42.1 | 37.6 | 95.3 |
66
+ | Google gemma-4-12B-it (base) | on | 48.8 | 48.8 | 49.5 | 43.9 | — |
67
+ | agentic v2 (this model) | off | 83.5 | 81.7 | — | — | — |
68
+ | agentic v2 (this model) | on | 86.0 | 82.9 | — | — | — |
69
 
70
  > ⚠️ **Partial results — this run is still in progress.** Empty cells (—) are filling in; the card will be updated as the full sweep (coder + remaining agentic/GSM8K) completes.
71