vukien2301/qwen3-1.7b-sft-gpt54mini-math_cot

SFT fine-tune of Qwen/Qwen3-1.7B on vukien2301/ultrainteract_math_cot_gpt54mini (78,349 GPT-5.4-mini regenerated math CoT problems from UltraInteract).

Training

  • Format: raw completion (no chat template) โ€” "{question}\n{model_prediction}" + EOS
  • DoRA r=128, alpha=256, dropout=0.05
  • Target modules: q,k,v,o,gate,up,down
  • LR 2e-5, 1 epoch, batch 8 ร— grad-accum 4 (eff. 32)
  • Adapter merged into base via peft.merge_and_unload()
  • Hardware: 1ร— B200, bf16, flash_attention_2

Evaluation (lm-evaluation-harness, vLLM backend)

Task Shots Base Qwen3-1.7B This model
GSM8K (flex) 5 69.74 70.13
GSM8K (strict) 5 69.37 67.55
GSM-Plus (flex) 5 50.07 50.26
GSM-Plus (str) 5 49.46 47.51
MATH500 4 14.20 16.40

Strict-match drop is format drift (\boxed{} vs #### N), not capability loss โ€” flex confirms.

Source

Built for EMNLP H2H-SD compound-pipeline experiments. See [github / paper link TBD].

Downloads last month
4
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for vukien2301/qwen3-1.7b-sft-gpt54mini-math_cot

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(810)
this model