Instructions to use vadimbelsky/qwen3.5-medical-ft-stage3-dpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vadimbelsky/qwen3.5-medical-ft-stage3-dpo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="vadimbelsky/qwen3.5-medical-ft-stage3-dpo")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("vadimbelsky/qwen3.5-medical-ft-stage3-dpo")
model = AutoModelForImageTextToText.from_pretrained("vadimbelsky/qwen3.5-medical-ft-stage3-dpo")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use vadimbelsky/qwen3.5-medical-ft-stage3-dpo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vadimbelsky/qwen3.5-medical-ft-stage3-dpo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vadimbelsky/qwen3.5-medical-ft-stage3-dpo",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/vadimbelsky/qwen3.5-medical-ft-stage3-dpo

SGLang

How to use vadimbelsky/qwen3.5-medical-ft-stage3-dpo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "vadimbelsky/qwen3.5-medical-ft-stage3-dpo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vadimbelsky/qwen3.5-medical-ft-stage3-dpo",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "vadimbelsky/qwen3.5-medical-ft-stage3-dpo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vadimbelsky/qwen3.5-medical-ft-stage3-dpo",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio new

How to use vadimbelsky/qwen3.5-medical-ft-stage3-dpo with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vadimbelsky/qwen3.5-medical-ft-stage3-dpo to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vadimbelsky/qwen3.5-medical-ft-stage3-dpo to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for vadimbelsky/qwen3.5-medical-ft-stage3-dpo to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="vadimbelsky/qwen3.5-medical-ft-stage3-dpo",
    max_seq_length=2048,
)

Docker Model Runner
How to use vadimbelsky/qwen3.5-medical-ft-stage3-dpo with Docker Model Runner:
```
docker model run hf.co/vadimbelsky/qwen3.5-medical-ft-stage3-dpo
```

Qwen3.5-9B Medical Triage — Stage 3 DPO (v4)

Emergency department triage model fine-tuned on Qwen3.5-9B via a 3-stage pipeline: Stage 1 (general medical SFT) → Stage 2 (ED intake SOAP → ESI decision SFT) → Stage 3 (DPO alignment to reduce over-triage, this model).

Quantized to Q4_K_M GGUF for on-device inference.

Model Description

Given an ED SOAP intake note, the model outputs a structured triage decision:

ESI level (1–5) with justification
Key clinical findings
Time-to-provider target
Immediate interventions required

ESI Scale: 1 = Immediate life threat · 2 = Emergent high-risk · 3 = Urgent stable · 4 = Less urgent · 5 = Non-urgent

Training Pipeline

Stage	Method	Objective
1	SFT (LoRA r=16)	General medical knowledge (PubMed, clinical guidelines)
2	SFT (LoRA r=16)	SOAP note → structured ESI triage decision
3	DPO (LoRA r=8)	Reduce over-triage · preserve ESI 1/2 high-risk recall

Stage 3 DPO Details

Base: Stage 2 LoRA checkpoint (vadimbelsky/qwen3.5-medical-ft-stage2)
Dataset: dpo_dataset_v4.jsonl — 5,413 raw pairs → 7,789 weighted pairs
Loss: Combined apo_down × 0.3 + sft × 1.0 (MPO-style)
Beta: 0.5 · LR: 5e-5 · Epochs: 0.1 (47 steps)
Batch: 2 × 8 gradient accumulation = effective 16
ESI label prepending: All chosen/rejected completions prefixed with explicit ESI label (e.g. ESI 2 — Emergent (high risk)\n\n...) to anchor preference signal at token position 0

Dataset Sources (v4)

Source	Description	Raw pairs	Weight	Weighted
A	Anti-overtriage synthetic (ESI 3→1/2 rejected)	2,388	1×	2,388
B	Anti-overtriage synthetic (ESI 4/5→1/2 rejected)	1,500	1×	1,500
C	Edge cases (synthetic boundary scenarios)	39	1×	39
D	ESI 1/2 anchor pairs (high-risk recall preservation)	890	3×	2,670
E-over	ESI 3 bidirectional — anti-overtriage	297	2×	594
E-under	ESI 3 bidirectional — anti-undertriage	299	2×	598
Total		5,413		7,789

Evaluation Results

Evaluated on MIMIC-IV-Ext Triage Instruction Corpus (MIETIC) — 36 human-expert validated RETAIN cases.

v4 vs Previous Stages

Metric	Stage 2 (SFT)	v1 DPO	v2 DPO	v3 DPO	v4 DPO	Target
Accuracy	~68%	55.6%	50.0%	27.8%	75.0%	>82%
Over-triage rate	~22%	22.2%	30.6%	0%	13.9%	<10%
Under-triage rate	~8%	36.1%	41.7%	72.2%	11.1%	<6%
High-risk recall (ESI 1+2)	~84%	76%	64%	40%	92%	100%
ESI 3 accuracy	~45%	~40%	~30%	~0%	60%	>65%

v4 Detailed Results (MIETIC, n=36)

Samples evaluated   : 36
ESI level parsed    : 36 / 36
Correct             : 27
Accuracy            : 75.0%
Under-triage rate   : 11.1% (4 cases)
Over-triage rate    : 13.9% (5 cases)
High-risk recall    : 92.0% (ESI 1+2, n=25)

Per-ESI Accuracy:

ESI Level	N	Correct	Accuracy
ESI 1	14	12	85.7%
ESI 2	11	9	81.8%
ESI 3	5	3	60.0%
ESI 4	4	2	50.0%
ESI 5	2	1	50.0%

Confusion Matrix (rows = ground truth, cols = predicted):

GT \ Pred  ESI 1  ESI 2  ESI 3  ESI 4  ESI 5
ESI 1         12      2      0      0      0
ESI 2          0      9      2      0      0
ESI 3          0      2      3      0      0
ESI 4          0      0      2      2      0
ESI 5          0      0      0      1      1

All remaining errors are ±1 ESI boundary confusions — no catastrophic mis-triage.

Key Lessons from DPO Iteration

v1–v3 failure: IPO/sigmoid loss collapsed when dataset direction was 100% anti-overtriage → catastrophic under-triage regression (40% high-risk recall at worst)
v4 fix: (1) ESI label prepended at token position 0 for unambiguous preference signal; (2) apo_down + sft combined loss preserves ESI 1/2 recall via SFT component; (3) Sources D (ESI 1/2 anchors ×3) + E (ESI 3 bidirectional ×2) balance dataset direction

Usage

# Requires llama.cpp server running with the Q4_K_M GGUF
# llama-server --model qwen3.5-medical-ft-stage3-dpo-q4km.gguf --port 8080 -c 4096

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")

SYSTEM_PROMPT = (
    "You are an expert emergency medicine triage nurse. "
    "Given a SOAP intake note, provide a structured triage decision including "
    "ESI level with justification, key clinical findings, time-to-provider target, "
    "and any immediate interventions required."
)

response = client.chat.completions.create(
    model="local",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "<SOAP intake note here>"},
    ],
    temperature=0.1,
    max_tokens=512,
)
print(response.choices[0].message.content)

Limitations & Safety

⚠️ This model is for research purposes only. It must NOT be used for clinical decision-making without licensed clinician oversight.

Evaluated on 36 MIETIC validation cases — not a clinical trial
11.1% under-triage rate means critical patients may be down-triaged
92% high-risk recall means ~8% of ESI 1/2 patients may be missed
Model has not been validated on real ED populations
Fine-tuned on synthetic + MIMIC-IV derived data only

Training Infrastructure

Hardware: NVIDIA GB10 (121 GB VRAM), 1 GPU
Framework: Unsloth 2026.3.4 + TRL DPOTrainer + Transformers 5.2.0
Training time: ~2 hours (47 steps)
Quantization: GGUF Q4_K_M via llama.cpp

Fine-tuned with Unsloth 🦥

Downloads last month: 87

Safetensors

Model size

10B params

Tensor type

BF16

F32

vadimbelsky
/

qwen3.5-medical-ft-stage3-dpo