Instructions to use vadimbelsky/qwen3.5-esi-triage-grpo-v49 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use vadimbelsky/qwen3.5-esi-triage-grpo-v49 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("models/Qwen3.5-9B") model = PeftModel.from_pretrained(base_model, "vadimbelsky/qwen3.5-esi-triage-grpo-v49") - Notebooks
- Google Colab
- Kaggle
Qwen3.5-9B GRPO v49 β ESI Triage (LoRA Adapter)
LoRA adapter (r=32, ~225 MB) for Qwen3.5-9B trained with GRPO. v49 refines v46 by adding rule-aware reward bonuses that target specific clinical-rule failures identified in v46's error analysis.
Result on MIETIC-36 (dual-mode eval):
- With thinking: 77.8% exact / 100.0% adjacent
- Without thinking: 77.8% exact / 100.0% adjacent
v49 is the first model in this series to combine v46's exact accuracy with v47's zero-dangerous-error safety profile, and the first to produce identical results across thinking modes.
For a full merged version (no PEFT required at inference), see vadimbelsky/qwen3.5-esi-triage-grpo-v49-merged.
What changed from v46
Error triage on v46's 8 wrong cases revealed three rule-application failures:
- Missed "lifesaving intervention already performed β ESI 1" (3 cases) β narratives with "intubated", "chest tube placed", "central line placed" not recognized as ESI-1 Step A triggers.
- Missed severe pain rule (1 case) β pain β₯ 7 should anchor ESI β€ 2 unless ESI-1 criteria are present.
- Missed open injury rule (1 case) β open fractures and penetrating trauma should anchor ESI β€ 2.
v49 adds rule-aware reward bonuses:
| Trigger in case text (regex) | Reward modifier |
|---|---|
| `intubat | chest tube |
| `intubat | chest tube |
| `open fracture | penetrating |
| Pain β₯ 7 + gold = 2 + pred > 2 | β0.5 |
Other changes (informed by v48's failure):
- Training budget raised 512 β 1024 tokens (matches eval, eliminates clipping)
- No-parse penalty hardened β0.5 β β2.0 (must dominate every wrong commitment)
- Warm-start from v46, 300 steps at LR 2e-7 (refinement, not relearning)
Training metrics
v49 was the cleanest GRPO run of this series:
clipped_ratioheld at 4β19% throughout (vs v48's 90%+)rewardpositive from step 10, peaked at +0.49reward_stdconsistently ~1.0+ (strong GRPO learning signal)- 300 steps in 17h 48m on NVIDIA GB10
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base = "Qwen/Qwen3.5-9B"
adapter = "vadimbelsky/qwen3.5-esi-triage-grpo-v49"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
model.eval()
SYSTEM = (
"You are an expert emergency triage nurse. "
"Extract clinical fields, apply the ESI algorithm step by step, then state the ESI level. "
"Be concise β stay under 150 words total."
)
case = ("A 78-year-old female arrived intubated for airway protection. "
"Central line placed. BP 120/58, HR 150, RR 20, SpO2 97%.")
prompt = tokenizer.apply_chat_template(
[{"role": "system", "content": SYSTEM},
{"role": "user", "content": case}],
tokenize=False, add_generation_prompt=True,
)
out = model.generate(
**tokenizer(prompt, return_tensors="pt").to(model.device),
max_new_tokens=1024, temperature=0.1, do_sample=True,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Limitations
Research model. Not approved for clinical use. Rule bonuses target specific regex patterns measured failing in v46 β they don't generalize. See the v46 model card for the full design journey and failure-mode lessons that shaped v49.
- Downloads last month
- 30