Text Generation
Transformers
Safetensors
Portuguese
qwen3
text-generation-inference
conversational
Eval Results (legacy)
nicholasKluge commited on
Commit
5a511c3
·
verified ·
1 Parent(s): 46d2f30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -291,9 +291,9 @@ Hence, even though Tucano2-qwen-0.5B-Instruct is released with a permissive lice
291
 
292
  The table below compares the Tucano2 (Instruct variant) series against other chat models of similar size. We divide our evaluations into three sets:
293
 
294
- - **Knowledge & Reasoning** ARC-Challenge, ENEM, BLUEX, OAB Exams, BELEBELE, MMLU, GSM8K-PT
295
- - **Instruction Following** IFEval-PT
296
- - **Coding** HumanEval
297
 
298
  The NPM (Normalized Performance Metric) provides a balanced view of model performance across tasks, accounting for each task's inherent difficulty by normalizing its evaluation score relative to its random baseline.
299
 
 
291
 
292
  The table below compares the Tucano2 (Instruct variant) series against other chat models of similar size. We divide our evaluations into three sets:
293
 
294
+ - **Knowledge & Reasoning:** ARC-Challenge, ENEM, BLUEX, OAB Exams, BELEBELE, MMLU, GSM8K-PT
295
+ - **Instruction Following:** IFEval-PT
296
+ - **Coding:** HumanEval
297
 
298
  The NPM (Normalized Performance Metric) provides a balanced view of model performance across tasks, accounting for each task's inherent difficulty by normalizing its evaluation score relative to its random baseline.
299