yuxiaoyang
/

opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516

Model card Files Files and versions

opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516

This public repository contains LoRA adapter checkpoints from an OPSD training run.

Method

Base model: meta-llama/Llama-3.1-8B-Instruct
Method: OPSD fixed-teacher non-thinking full-vocabulary JSD with per-token clipping
Teacher: fixed base policy with LoRA adapters disabled during teacher forward passes
Loss: full-vocabulary forward KL/JSD beta=0
Per-token JSD clipping: 1e-07
Student/teacher thinking flags: False / False
Dataset: siyanzhao/Openthoughts_math_30k_opsd
Train budget: max_steps=200, max_completion_length=1024
Batch: per_device_train_batch_size=1, gradient_accumulation_steps=2, effective batch 8
vLLM: colocate, GPU memory utilization 0.35
GPUs: 4

Only adapter/checkpoint artifacts and logs are uploaded; optimizer states are intentionally omitted.

Downloads last month: 125

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yuxiaoyang/opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2450)

this model