Qwen3.5-35B-A3B-DeepMed-6task-SFT-epoch1

SFT of Qwen3.5-35B-A3B on DeepMed distilled trajectories (6 tasks) — end of epoch 1 (global step 210; epoch boundary at step 225).

  • Base: Qwen/Qwen3.5-35B-A3B
  • Data: 7,204 native-tool-call trajectories (distilled from 6 EHR tasks), filtered at 52K tokens
  • Framework: verl + Megatron-Core (TP=2, EP=8, PP=1), bf16, full CPU optimizer offload
  • Hyperparameters: lr 2e-5 (cosine), warmup 10 steps, weight decay 0.1, global batch 32
  • Train loss: ~0.20 | Val loss: ~0.236 at this step

Companion checkpoint: Chtholly17/Qwen3.5-35B-A3B-DeepMed-6task-SFT-epoch2.

Downloads last month
2
Safetensors
Model size
35B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chtholly17/Qwen3.5-35B-A3B-DeepMed-6task-SFT-epoch1

Finetuned
(128)
this model