opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516

This public repository contains LoRA adapter checkpoints from an OPSD training run.

Method

  • Base model: meta-llama/Llama-3.1-8B-Instruct
  • Method: OPSD fixed-teacher non-thinking full-vocabulary JSD with per-token clipping
  • Teacher: fixed base policy with LoRA adapters disabled during teacher forward passes
  • Loss: full-vocabulary forward KL/JSD beta=0
  • Per-token JSD clipping: 1e-07
  • Student/teacher thinking flags: False / False
  • Dataset: siyanzhao/Openthoughts_math_30k_opsd
  • Train budget: max_steps=200, max_completion_length=1024
  • Batch: per_device_train_batch_size=1, gradient_accumulation_steps=2, effective batch 8
  • vLLM: colocate, GPU memory utilization 0.35
  • GPUs: 4

Only adapter/checkpoint artifacts and logs are uploaded; optimizer states are intentionally omitted.

Downloads last month
125
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yuxiaoyang/opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516

Adapter
(2450)
this model