Model Card for 522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO

This model is a DPO-aligned version of Qwen/Qwen1.5-1.8B-Chat, trained to produce safer, more helpful responses in Vietnamese through Direct Preference Optimization (DPO). It builds on the previously fine-tuned SFT model, using preference pairs (chosen vs rejected completions) to optimize for alignment and instruction-following.

🧠 Model Details

Base Model: Qwen/Qwen1.5-1.8B-Chat
Model Type: Causal Language Model (Chat)
Languages: Vietnamese
License: Apache 2.0
Fine-tuning Method: Direct Preference Optimization (DPO)
Training Framework: HuggingFace TRL (transformers + accelerate)
Preference Dataset: 10,000 Vietnamese prompt pairs (chosen/rejected)

✅ Intended Uses

Direct Use

Aligned, polite, and safe Vietnamese conversation
General instruction following and open-domain QA
Scenarios requiring sensitive language generation

Out-of-Scope Use

Safety-critical systems (e.g. legal, medical, financial advice)
English-language generation (model is tuned for Vietnamese)
Jailbreak testing without appropriate safeguards

📦 Dataset

Source: Translated + curated prompts from HarmBench, OpenAssistant, and JailbreakEval
Format: JSONL pairs of prompts with chosen and rejected completions
Composition: 60% safe / 40% unsafe adversarial instructions
Filtering: Detoxify used to score and select completions with varied toxicity

🧪 Evaluation

Metrics

Helpfulness (human-rated or automated ranking)
Toxicity (via Detoxify threshold > 0.5)
Preference Win Rate vs. base SFT model

Summary

85% rejection on unsafe or adversarial inputs
80% preference over SFT responses on standard Vietnamese prompts
Maintains helpfulness while increasing safety and appropriateness

🚀 How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-1.8B-Chat")
model = AutoModelForCausalLM.from_pretrained("522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO")

# Chat with the model
prompt = "Tôi cần lời khuyên để giảm stress công việc."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Model tree for 522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO

Base model

Qwen/Qwen1.5-1.8B-Chat

Adapter

(335)

this model