Instructions to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AdamLucek/Qwen3-4B-Instruct-2507-PII-RL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AdamLucek/Qwen3-4B-Instruct-2507-PII-RL")
model = AutoModelForCausalLM.from_pretrained("AdamLucek/Qwen3-4B-Instruct-2507-PII-RL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AdamLucek/Qwen3-4B-Instruct-2507-PII-RL

SGLang

How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with Docker Model Runner:
```
docker model run hf.co/AdamLucek/Qwen3-4B-Instruct-2507-PII-RL
```

Qwen3-4B-Instruct-2507-PII-RL

Qwen3-4B-Instruct-2507-PII-RL is a LoRA reinforcement learning fine-tune of Qwen/Qwen3-4B-Instruct-2507, trained for 740 policy updates on batches sampled from AdamLucek/open-pii-masking-en-us-30k using the adamlucek/pii-masking environment.

Qwen3-4B-Instruct-2507-PII-RL has been trained to mask PII data. Given an input phrase it will output the same phrase with all PII instances replaced by [PII].

Training

This model was trained using Tinker and the adamlucek/pii-masking environment with the following specs:

Parameter	Value
Method	LoRA (`rank=32`)
Environment	`pii-masking` verifiers environment
Batch size	256 trajectory groups (`groups_per_batch=32` × `group_size=8`)
Max sequence length	512 tokens
Optimizer	Adam (`lr=1e-5`, `β1=0.9`, `β2=0.95`, `ε=1e-8`)
Scheduler	Constant learning rate
Dataset	Full training set (`num_train_examples=-1`)

Over 740 training steps, the following reward curve was produced:

Rewards

The reward function is a weighted combination of three components:

Component	Weight	Description
`exact_match_reward`	1.0	Binary reward (1.0 if the parsed masked output exactly matches the expected answer character-by-character, 0.0 otherwise)
`pii_count_reward`	0.5	Binary reward (1.0 if the number of `[PII]` tags in the output matches the expected count, 0.0 otherwise)
`format_reward`	0.1	Parser-generated format reward ensuring the output is properly formatted with valid XML tags (`<masked_output>...</masked_output>`)

The final reward is calculated as:

reward = (1.0 × exact_match_reward) + (0.5 × pii_count_reward) + (0.1 × format_reward)

Reward Range: The reward can range from 0.0 (worst) to 1.6 (best), where:

1.6: Perfect match with correct PII count and valid format
1.0: Exact match but incorrect PII count or invalid format
0.5-0.6: Correct PII count but inexact match, with/without format compliance
0.0-0.1: No match, incorrect count, or invalid format

Usage

Loading and using the model via Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Replace with the actual repository name you used
model_id = "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL"

# Load the merged model and tokenizer from the Hugging Face Hub
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

system_prompt = """Replace all personally identifiable information (PII) in the text with [PII] tags. 
PII includes: names, dates, phone numbers, SSNs, account numbers, addresses, email addresses, and any other identifying information.

Examples:
Input: Ticket Reservation for Florije: 'one ticket for Madame on October 8th, 1990'
Output: Ticket Reservation for [PII]: 'one ticket for [PII] on [PII]'

Input: User account recovery: "Hi Arljind Komla, your account recovery key is 426220045."
Output: User account recovery: "Hi [PII], your account recovery key is [PII]."

Return ONLY the masked text wrapped in masked_outputXML tags:
<masked_output>
[Your masked text here]
</masked_output>"""

# Prepare a prompt for inference using the messages format
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Hi Balian, we are reaching out to confirm your gaming preferences. Your account, EL@protonmail.com, has been inactive for 46 months. Please verify your account details, including 72611183194555."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(text, return_tensors="pt").input_ids

# Perform inference
output = model.generate(input_ids, max_length=512)

# Decode and print the generated text
print(tokenizer.decode(output[0], skip_special_tokens=True))

Output:

<masked_output>
Hi [PII], we are reaching out to confirm your gaming preferences. Your account, [PII], has been inactive for [PII] months. Please verify your account details, including [PII].
</masked_output>