--- base_model: - Qwen/Qwen3-8B --- --- language: - en library_name: transformers base_model: Qwen/Qwen3-8B tags: - qwen3 - sdft - sdpo - distillation - biology license: apache-2.0 --- # qwen3-8b-biology-1h `qwen3-8b-biology-1h` is the **~1 hour wall-clock** checkpoint of `Qwen/Qwen3-8B` trained on biology with an SDPO-style self-distillation pipeline. ## Method This model follows the SDPO method from: - https://arxiv.org/abs/2601.20802v1 ## Checkpoint - Snapshot: `step_10` - Format: sharded `safetensors` - Repo: `wambosec/qwen3-8b-biology-1h` ## Training Setup (this run) - Base model: `Qwen/Qwen3-8B` - Dataset: `sciknoweval/biology` (train split) - Teacher regularization: EMA - Distillation: top-k (`k=100`) + tail bucket - Importance sampling: token-level, clipped - Completions per prompt: 8 - Max prompt length: 2048 - Max completion length: 8192 ## Repro (command used style) ```bash uv run sdft @ configs/sdft/generalization.toml \ --trainer.data.dataset_name=../SDPO/datasets/sciknoweval/biology \ --trainer.ckpt.interval=10 \ --trainer.ckpt.keep-last=1 \ --trainer.ckpt.weights.save-format=safetensors \ --trainer.ckpt.weights.save-sharded ## Intended Use Research checkpoint for: - early training-dynamics analysis, - biology-domain probing, - continuation finetuning. ## Limitations - This is an intermediate checkpoint, not a final converged model. - No full safety/alignment evaluation is claimed here. - Metrics are not reported as a final benchmark release. ## Usage from transformers import AutoTokenizer, AutoModelForCausalLM repo = "wambosec/qwen3-8b-biology-1h" tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( repo, torch_dtype="auto", device_map="auto", trust_remote_code=True, ) ## Citation If you use this checkpoint, please cite SDPO: - https://arxiv.org/abs/2601.20802v1