--- license: apache-2.0 base_model: Qwen/Qwen3.5-35B-A3B library_name: transformers tags: - qwen3_5 - moe - medical - sft - deepmed --- # Qwen3.5-35B-A3B-DeepMed-6task-SFT-epoch1 SFT of Qwen3.5-35B-A3B on DeepMed distilled trajectories (6 tasks) — **end of epoch 1** (global step 210; epoch boundary at step 225). - Base: `Qwen/Qwen3.5-35B-A3B` - Data: 7,204 native-tool-call trajectories (distilled from 6 EHR tasks), filtered at 52K tokens - Framework: verl + Megatron-Core (TP=2, EP=8, PP=1), bf16, full CPU optimizer offload - Hyperparameters: lr 2e-5 (cosine), warmup 10 steps, weight decay 0.1, global batch 32 - Train loss: ~0.20 | Val loss: ~0.236 at this step Companion checkpoint: [`Chtholly17/Qwen3.5-35B-A3B-DeepMed-6task-SFT-epoch2`](https://huggingface.co/Chtholly17/Qwen3.5-35B-A3B-DeepMed-6task-SFT-epoch2).