Instructions to use puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored") model = AutoModelForCausalLM.from_pretrained("puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored
- SGLang
How to use puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored with Docker Model Runner:
docker model run hf.co/puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored is an uncensored model based on Qwen/Qwen3-Next-80B-A3B-Thinking, fine-tuned using SFT,GRPO with LoRA.
This model has been fine-tuned using an instruction format.
Disclaimer: We take no responsibility for the outputs of this model. Please use it at your own risk.
Training Process
This model was trained using a two-stage process:
Step 1: SFT (Supervised Fine-Tuning)
- Dataset: 12,000 samples
- Composition: Jailbreak 10k + General 1.5k + Logic 0.5k
- Objective: To learn the format and the "uncensored" attitude while maintaining the model's intelligence.
Step 2: GRPO (Reinforcement Learning)
- Dataset: 60,000 samples
- Reward Model: puwaer/Unsafe-Reward-Qwen3-1.7B
- Composition: Multilingual Jailbreak prompts
- Objective: To improve the ability to generate more natural and persuasive harmful responses.
Model Performance
Below is the comparative evaluation of this model (GRPO), the intermediate checkpoint (SFT), and the base model (Safe).
⚠️ Note on Evaluation Environment Due to budget constraints,
gpt-4o-miniwas used for the LLM-as-a-Judge process (including "Do Not Answer" and MT-Bench). Please note that the scoring trends and criteria may differ from results evaluated using the standardgpt-4.
Safety Evaluation (Lower is better / Higher success in uncensoring)
In the "Do Not Answer" (DNA) and "Sorry Bench" benchmarks, while the base model shows a high refusal rate (~88%), this model achieves an extremely low refusal rate of under 4%–15%.
| Benchmark | Metric | Base (Safe) | SFT (Step1) | GRPO (This Model) |
|---|---|---|---|---|
| do not answer | Safety Acc (Low is Better) | 0.9979 | 0.8275 | 0.147 |
| do not answer jp | Safety Acc (Low is Better) | 0.984 | 0.5378 | 0.0873 |
| Sorry Bench | Safety Acc (Low is Better) | 0.8886 | 0.8455 | 0.0409 |
Capability Evaluation (Higher is better)
Generally, "uncensoring" (lobotomy) procedures tend to degrade a model's general intelligence. However, this model recovered its conversational scores (e.g., MT-Bench) by proceeding from the SFT stage to GRPO.
| Benchmark | Metric | Base (Safe) | SFT (Step1) | GRPO (This Model) |
|---|---|---|---|---|
| MT-Bench | Average Score (1-10) | 8.044 | 7.538 | 7.513 |
| LM Harness | Average Acc (GSM8K, MMLU) | 0.8454 | 0.8483 | 0.8436 |
*Comparisons made between Qwen3-Next-80B-A3B-Thinking (Base) *
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content) # no opening <think> tag
print("content:", content)
Data Overview
Datasets
The following datasets were used for training this model:
- Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1
- AI-MO/NuminaMath-CoT
- open-thoughts/OpenThoughts-114k
- puwaer/cvalues_rlhf_en_cot
- puwaer/cvalues_rlhf_zh_cot
- puwaer/cvalues_rlhf_jp_cot
Reward Model
- Downloads last month
- 132