--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/dpo-dataset-qwen-cot language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - dpo - unsloth - qwen - alignment --- # <Qwen3-4B Structured-Output SFT + DPO Aligned Model> This repository provides a **merged 16-bit model** derived from **Qwen/Qwen3-4B-Instruct-2507**. The model was trained using **Direct Preference Optimization (DPO)** via the **Unsloth** library, **starting from an SFT-initialized LoRA adapter**. **Training pipeline:** Base model → Structured-output SFT (LoRA) → DPO preference alignment → merged_16bit export ## What is included ✅ **Full merged 16-bit weights** (no adapter loading required) ✅ Tokenizer files ✅ Model card (this README) ## Training Pipeline ### Step 1 — SFT Initialization (LoRA) Before DPO, the base model was initialized with an SFT LoRA adapter to improve **structured output behavior** (JSON / YAML / XML / TOML / CSV style formatting). SFT initialization adapter: `MSakae/qwen3-4b-structured-output-lora_sample_try_L4` ### Step 2 — DPO Alignment The SFT-initialized model was further optimized using DPO with a preference dataset. DPO dataset: `u-10bei/dpo-dataset-qwen-cot` DPO aims to improve: - Preference alignment between chosen vs rejected responses - Response consistency and selection quality - Structured response quality under preference constraint ## Training Configuration - **Base model**: Qwen/Qwen3-4B-Instruct-2507 - **Method**: DPO (Direct Preference Optimization) - **Epochs**: 1 - **Learning rate**: 1e-06 - **Beta**: 0.1 - **Max sequence length**: 1024 ## Usage Since this is a merged model, you can use it directly with `transformers`. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "your_id/your-repo-name" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto" ) # Test inference prompt = "Your question here" inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0])) ``` ## Sources & License (IMPORTANT) * **Preference dataset**: u-10bei/dpo-dataset-qwen-cot * **Base model terms**: Users must comply with the original base model’s license/terms of use. * **Dataset terms**: Users must comply with the dataset license/terms (including any required notices).