Instructions to use yuxiaoyang/opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use yuxiaoyang/opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "yuxiaoyang/opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516") - Notebooks
- Google Colab
- Kaggle
opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516
This public repository contains LoRA adapter checkpoints from an OPSD training run.
Method
- Base model:
meta-llama/Llama-3.1-8B-Instruct - Method: OPSD fixed-teacher non-thinking full-vocabulary JSD with per-token clipping
- Teacher: fixed base policy with LoRA adapters disabled during teacher forward passes
- Loss: full-vocabulary forward KL/JSD beta=0
- Per-token JSD clipping:
1e-07 - Student/teacher thinking flags:
False / False - Dataset:
siyanzhao/Openthoughts_math_30k_opsd - Train budget:
max_steps=200,max_completion_length=1024 - Batch:
per_device_train_batch_size=1,gradient_accumulation_steps=2, effective batch8 - vLLM:
colocate, GPU memory utilization0.35 - GPUs: 4
Only adapter/checkpoint artifacts and logs are uploaded; optimizer states are intentionally omitted.
- Downloads last month
- 125
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for yuxiaoyang/opsd-llama31-8b-instruct-nonthink-gen1024-step200-jsdclip1e-7-20260516
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct