Qwen3.5 9B Typst GRPO LoRA

This repository contains the adapter-only checkpoint from the VERL Typst APPS GRPO run that completed one full training step on 2026-04-23. It does not include merged base-model weights.

The run was initialized from the local warm SFT merged model at /workspace/typst_universe_scrape/outputs/qwen35-9b-merged, itself based on Qwen/Qwen3.5-9B. For exact reproduction, load this adapter on top of that warm merged model rather than the plain public base.

Contents

  • adapter_model.safetensors: exported LoRA adapter weights from the VERL FSDP checkpoint
  • adapter_config.json: PEFT adapter configuration

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "/path/to/qwen35-9b-merged"
adapter = "uam-rl/qwen35-9b-typst-grpo-lora"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Training Snapshot

  • Method: VERL GRPO / DrGRPO-style advantage normalization disabled
  • Warm start: local SFT merged model, not the plain public base
  • Adapter: LoRA rank 64, alpha 128, dropout 0.0
  • Optimizer: MuonWithAdamW hybrid
  • Rollout engine: vLLM, tensor parallel size 2
  • Rollout cap: 32k response tokens with thinking enabled
  • Checkpoint source: /workspace/eval_results/typst_grpo_real_bf16_sp/checkpoints/global_step_1/actor
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for uam-rl/qwen35-9b-typst-grpo-lora

Finetuned
Qwen/Qwen3.5-9B
Adapter
(385)
this model