vukien2301/qwen3-1.7b-sft-gpt54mini-math_cot

SFT fine-tune of Qwen/Qwen3-1.7B on vukien2301/ultrainteract_math_cot_gpt54mini (78,349 GPT-5.4-mini regenerated math CoT problems from UltraInteract).

Training

Format: raw completion (no chat template) — "{question}\n{model_prediction}" + EOS
DoRA r=128, alpha=256, dropout=0.05
Target modules: q,k,v,o,gate,up,down
LR 2e-5, 1 epoch, batch 8 × grad-accum 4 (eff. 32)
Adapter merged into base via peft.merge_and_unload()
Hardware: 1× B200, bf16, flash_attention_2

Task	Shots	Base Qwen3-1.7B	This model
GSM8K (flex)	5	69.74	70.13
GSM8K (strict)	5	69.37	67.55
GSM-Plus (flex)	5	50.07	50.26
GSM-Plus (str)	5	49.46	47.51
MATH500	4	14.20	16.40

Strict-match drop is format drift (\boxed{} vs #### N), not capability loss — flex confirms.

Built for EMNLP H2H-SD compound-pipeline experiments. See [github / paper link TBD].

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Finetuned

(810)

this model