Instructions to use uam-rl/qwen35-9b-typst-grpo-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use uam-rl/qwen35-9b-typst-grpo-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "uam-rl/qwen35-9b-typst-grpo-lora") - Notebooks
- Google Colab
- Kaggle
Qwen3.5 9B Typst GRPO LoRA
This repository contains the adapter-only checkpoint from the VERL Typst APPS GRPO run that completed one full training step on 2026-04-23. It does not include merged base-model weights.
The run was initialized from the local warm SFT merged model at /workspace/typst_universe_scrape/outputs/qwen35-9b-merged, itself based on Qwen/Qwen3.5-9B.
For exact reproduction, load this adapter on top of that warm merged model rather than the plain public base.
Contents
adapter_model.safetensors: exported LoRA adapter weights from the VERL FSDP checkpointadapter_config.json: PEFT adapter configuration
Loading
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "/path/to/qwen35-9b-merged"
adapter = "uam-rl/qwen35-9b-typst-grpo-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Training Snapshot
- Method: VERL GRPO / DrGRPO-style advantage normalization disabled
- Warm start: local SFT merged model, not the plain public base
- Adapter: LoRA rank 64, alpha 128, dropout 0.0
- Optimizer: MuonWithAdamW hybrid
- Rollout engine: vLLM, tensor parallel size 2
- Rollout cap: 32k response tokens with thinking enabled
- Checkpoint source:
/workspace/eval_results/typst_grpo_real_bf16_sp/checkpoints/global_step_1/actor
- Downloads last month
- 2