Instructions to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AdamLucek/Qwen3-4B-Instruct-2507-PII-RL") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AdamLucek/Qwen3-4B-Instruct-2507-PII-RL") model = AutoModelForCausalLM.from_pretrained("AdamLucek/Qwen3-4B-Instruct-2507-PII-RL") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AdamLucek/Qwen3-4B-Instruct-2507-PII-RL
- SGLang
How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AdamLucek/Qwen3-4B-Instruct-2507-PII-RL with Docker Model Runner:
docker model run hf.co/AdamLucek/Qwen3-4B-Instruct-2507-PII-RL
Qwen3-4B-Instruct-2507-PII-RL
Qwen3-4B-Instruct-2507-PII-RL is a LoRA reinforcement learning fine-tune of Qwen/Qwen3-4B-Instruct-2507, trained for 740 policy updates on batches sampled from AdamLucek/open-pii-masking-en-us-30k using the adamlucek/pii-masking environment.
Qwen3-4B-Instruct-2507-PII-RL has been trained to mask PII data. Given an input phrase it will output the same phrase with all PII instances replaced by [PII].
Training
This model was trained using Tinker and the adamlucek/pii-masking environment with the following specs:
| Parameter | Value |
|---|---|
| Method | LoRA (rank=32) |
| Environment | pii-masking verifiers environment |
| Batch size | 256 trajectory groups (groups_per_batch=32 × group_size=8) |
| Max sequence length | 512 tokens |
| Optimizer | Adam (lr=1e-5, β1=0.9, β2=0.95, ε=1e-8) |
| Scheduler | Constant learning rate |
| Dataset | Full training set (num_train_examples=-1) |
Over 740 training steps, the following reward curve was produced:
Rewards
The reward function is a weighted combination of three components:
| Component | Weight | Description |
|---|---|---|
exact_match_reward |
1.0 | Binary reward (1.0 if the parsed masked output exactly matches the expected answer character-by-character, 0.0 otherwise) |
pii_count_reward |
0.5 | Binary reward (1.0 if the number of [PII] tags in the output matches the expected count, 0.0 otherwise) |
format_reward |
0.1 | Parser-generated format reward ensuring the output is properly formatted with valid XML tags (<masked_output>...</masked_output>) |
The final reward is calculated as:
reward = (1.0 × exact_match_reward) + (0.5 × pii_count_reward) + (0.1 × format_reward)
Reward Range: The reward can range from 0.0 (worst) to 1.6 (best), where:
- 1.6: Perfect match with correct PII count and valid format
- 1.0: Exact match but incorrect PII count or invalid format
- 0.5-0.6: Correct PII count but inexact match, with/without format compliance
- 0.0-0.1: No match, incorrect count, or invalid format
Usage
Loading and using the model via Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
# Replace with the actual repository name you used
model_id = "AdamLucek/Qwen3-4B-Instruct-2507-PII-RL"
# Load the merged model and tokenizer from the Hugging Face Hub
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
system_prompt = """Replace all personally identifiable information (PII) in the text with [PII] tags.
PII includes: names, dates, phone numbers, SSNs, account numbers, addresses, email addresses, and any other identifying information.
Examples:
Input: Ticket Reservation for Florije: 'one ticket for Madame on October 8th, 1990'
Output: Ticket Reservation for [PII]: 'one ticket for [PII] on [PII]'
Input: User account recovery: "Hi Arljind Komla, your account recovery key is 426220045."
Output: User account recovery: "Hi [PII], your account recovery key is [PII]."
Return ONLY the masked text wrapped in masked_outputXML tags:
<masked_output>
[Your masked text here]
</masked_output>"""
# Prepare a prompt for inference using the messages format
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Hi Balian, we are reaching out to confirm your gaming preferences. Your account, EL@protonmail.com, has been inactive for 46 months. Please verify your account details, including 72611183194555."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(text, return_tensors="pt").input_ids
# Perform inference
output = model.generate(input_ids, max_length=512)
# Decode and print the generated text
print(tokenizer.decode(output[0], skip_special_tokens=True))
Output:
<masked_output>
Hi [PII], we are reaching out to confirm your gaming preferences. Your account, [PII], has been inactive for [PII] months. Please verify your account details, including [PII].
</masked_output>
LoRA Adapter
The unmerged LoRA adapter is available in lora_adapter.
Additional Information
For all other information about the base model and usage, refer to the original Qwen/Qwen3-4B-Instruct-2507 page.
- Downloads last month
- 3
Model tree for AdamLucek/Qwen3-4B-Instruct-2507-PII-RL
Base model
Qwen/Qwen3-4B-Instruct-2507