Instructions to use 522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use 522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-1.8B-Chat") model = PeftModel.from_pretrained(base_model, "522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO") - Notebooks
- Google Colab
- Kaggle
Model Card for 522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO
This model is a DPO-aligned version of Qwen/Qwen1.5-1.8B-Chat, trained to produce safer, more helpful responses in Vietnamese through Direct Preference Optimization (DPO). It builds on the previously fine-tuned SFT model, using preference pairs (chosen vs rejected completions) to optimize for alignment and instruction-following.
π§ Model Details
- Base Model: Qwen/Qwen1.5-1.8B-Chat
- Model Type: Causal Language Model (Chat)
- Languages: Vietnamese
- License: Apache 2.0
- Fine-tuning Method: Direct Preference Optimization (DPO)
- Training Framework: HuggingFace TRL (transformers + accelerate)
- Preference Dataset: 10,000 Vietnamese prompt pairs (chosen/rejected)
β Intended Uses
Direct Use
- Aligned, polite, and safe Vietnamese conversation
- General instruction following and open-domain QA
- Scenarios requiring sensitive language generation
Out-of-Scope Use
- Safety-critical systems (e.g. legal, medical, financial advice)
- English-language generation (model is tuned for Vietnamese)
- Jailbreak testing without appropriate safeguards
π¦ Dataset
- Source: Translated + curated prompts from HarmBench, OpenAssistant, and JailbreakEval
- Format: JSONL pairs of prompts with
chosenandrejectedcompletions - Composition: 60% safe / 40% unsafe adversarial instructions
- Filtering: Detoxify used to score and select completions with varied toxicity
π§ͺ Evaluation
Metrics
- Helpfulness (human-rated or automated ranking)
- Toxicity (via Detoxify threshold > 0.5)
- Preference Win Rate vs. base SFT model
Summary
85% rejection on unsafe or adversarial inputs
80% preference over SFT responses on standard Vietnamese prompts
- Maintains helpfulness while increasing safety and appropriateness
π How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-1.8B-Chat")
model = AutoModelForCausalLM.from_pretrained("522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO")
# Chat with the model
prompt = "TΓ΄i cαΊ§n lα»i khuyΓͺn Δα» giαΊ£m stress cΓ΄ng viα»c."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- -
Model tree for 522H0134-NguyenNhatHuy/Qwen-1.5-1.8B-Chat-DPO
Base model
Qwen/Qwen1.5-1.8B-Chat