vukien2301/qwen3-1.7b-sft-gpt54mini-math_cot
SFT fine-tune of Qwen/Qwen3-1.7B on
vukien2301/ultrainteract_math_cot_gpt54mini
(78,349 GPT-5.4-mini regenerated math CoT problems from UltraInteract).
Training
- Format: raw completion (no chat template) โ
"{question}\n{model_prediction}"+ EOS - DoRA r=128, alpha=256, dropout=0.05
- Target modules: q,k,v,o,gate,up,down
- LR 2e-5, 1 epoch, batch 8 ร grad-accum 4 (eff. 32)
- Adapter merged into base via
peft.merge_and_unload() - Hardware: 1ร B200, bf16, flash_attention_2
Evaluation (lm-evaluation-harness, vLLM backend)
| Task | Shots | Base Qwen3-1.7B | This model |
|---|---|---|---|
| GSM8K (flex) | 5 | 69.74 | 70.13 |
| GSM8K (strict) | 5 | 69.37 | 67.55 |
| GSM-Plus (flex) | 5 | 50.07 | 50.26 |
| GSM-Plus (str) | 5 | 49.46 | 47.51 |
| MATH500 | 4 | 14.20 | 16.40 |
Strict-match drop is format drift (\boxed{} vs #### N), not capability loss โ flex confirms.
Source
Built for EMNLP H2H-SD compound-pipeline experiments. See [github / paper link TBD].
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support