language: - en library_name: transformers base_model: Qwen/Qwen3-8B tags: - qwen3 - sdft - sdpo - distillation - biology license: apache-2.0

qwen3-8b-biology-1h

qwen3-8b-biology-1h is the ~1 hour wall-clock checkpoint of Qwen/Qwen3-8B trained on biology with an SDPO-style self-distillation pipeline.

Method

This model follows the SDPO method from:

https://arxiv.org/abs/2601.20802v1

Checkpoint

Snapshot: step_10
Format: sharded safetensors
Repo: wambosec/qwen3-8b-biology-1h

Training Setup (this run)

Base model: Qwen/Qwen3-8B
Dataset: sciknoweval/biology (train split)
Teacher regularization: EMA
Distillation: top-k (k=100) + tail bucket
Importance sampling: token-level, clipped
Completions per prompt: 8
Max prompt length: 2048
Max completion length: 8192

Repro (command used style)

uv run sdft @ configs/sdft/generalization.toml \
  --trainer.data.dataset_name=../SDPO/datasets/sciknoweval/biology \
  --trainer.ckpt.interval=10 \
  --trainer.ckpt.keep-last=1 \
  --trainer.ckpt.weights.save-format=safetensors \
  --trainer.ckpt.weights.save-sharded

## Intended Use

Research checkpoint for:

- early training-dynamics analysis,
- biology-domain probing,
- continuation finetuning.

## Limitations

- This is an intermediate checkpoint, not a final converged model.
- No full safety/alignment evaluation is claimed here.
- Metrics are not reported as a final benchmark release.

## Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

repo = "wambosec/qwen3-8b-biology-1h"
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

## Citation

If you use this checkpoint, please cite SDPO:

- https://arxiv.org/abs/2601.20802v1