---
base_model:
- Qwen/Qwen3-8B
---
  ---
  language:
  - en
  library_name: transformers
  base_model: Qwen/Qwen3-8B
  tags:
  - qwen3
  - sdft
  - sdpo
  - distillation
  - biology
  license: apache-2.0
  ---

  # qwen3-8b-biology-1h

  `qwen3-8b-biology-1h` is the **~1 hour wall-clock** checkpoint of `Qwen/Qwen3-8B` trained on biology with an SDPO-style self-distillation pipeline.

  ## Method

  This model follows the SDPO method from:

  - https://arxiv.org/abs/2601.20802v1

  ## Checkpoint

  - Snapshot: `step_10`
  - Format: sharded `safetensors`
  - Repo: `wambosec/qwen3-8b-biology-1h`

  ## Training Setup (this run)

  - Base model: `Qwen/Qwen3-8B`
  - Dataset: `sciknoweval/biology` (train split)
  - Teacher regularization: EMA
  - Distillation: top-k (`k=100`) + tail bucket
  - Importance sampling: token-level, clipped
  - Completions per prompt: 8
  - Max prompt length: 2048
  - Max completion length: 8192

  ## Repro (command used style)

  ```bash
  uv run sdft @ configs/sdft/generalization.toml \
    --trainer.data.dataset_name=../SDPO/datasets/sciknoweval/biology \
    --trainer.ckpt.interval=10 \
    --trainer.ckpt.keep-last=1 \
    --trainer.ckpt.weights.save-format=safetensors \
    --trainer.ckpt.weights.save-sharded

  ## Intended Use

  Research checkpoint for:

  - early training-dynamics analysis,
  - biology-domain probing,
  - continuation finetuning.

  ## Limitations

  - This is an intermediate checkpoint, not a final converged model.
  - No full safety/alignment evaluation is claimed here.
  - Metrics are not reported as a final benchmark release.

  ## Usage

  from transformers import AutoTokenizer, AutoModelForCausalLM

  repo = "wambosec/qwen3-8b-biology-1h"
  tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
  model = AutoModelForCausalLM.from_pretrained(
      repo,
      torch_dtype="auto",
      device_map="auto",
      trust_remote_code=True,
  )

  ## Citation

  If you use this checkpoint, please cite SDPO:

  - https://arxiv.org/abs/2601.20802v1