Qwen3.5 9B Typst GRPO LoRA

This repository contains the adapter-only checkpoint from the VERL Typst APPS GRPO run that completed one full training step on 2026-04-23. It does not include merged base-model weights.

The run was initialized from the local warm SFT merged model at /workspace/typst_universe_scrape/outputs/qwen35-9b-merged, itself based on Qwen/Qwen3.5-9B. For exact reproduction, load this adapter on top of that warm merged model rather than the plain public base.

adapter_model.safetensors: exported LoRA adapter weights from the VERL FSDP checkpoint
adapter_config.json: PEFT adapter configuration

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "/path/to/qwen35-9b-merged"
adapter = "uam-rl/qwen35-9b-typst-grpo-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Training Snapshot

Method: VERL GRPO / DrGRPO-style advantage normalization disabled
Warm start: local SFT merged model, not the plain public base
Adapter: LoRA rank 64, alpha 128, dropout 0.0
Optimizer: MuonWithAdamW hybrid
Rollout engine: vLLM, tensor parallel size 2
Rollout cap: 32k response tokens with thinking enabled
Checkpoint source: /workspace/eval_results/typst_grpo_real_bf16_sp/checkpoints/global_step_1/actor

Downloads last month: 2

Model tree for uam-rl/qwen35-9b-typst-grpo-lora

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Adapter

(385)

this model

uam-rl
/

qwen35-9b-typst-grpo-lora

Qwen3.5 9B Typst GRPO LoRA

Contents

Loading

Training Snapshot

Model tree for uam-rl/qwen35-9b-typst-grpo-lora