---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/dpo-dataset-qwen-cot
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- dpo
- unsloth
- qwen
- alignment
---

# ＜Qwen3-4B Structured-Output SFT + DPO Aligned Model＞

This repository provides a **merged 16-bit model** derived from **Qwen/Qwen3-4B-Instruct-2507**.
The model was trained using **Direct Preference Optimization (DPO)** via the **Unsloth** library, **starting from an SFT-initialized LoRA adapter**.

**Training pipeline:**
Base model → Structured-output SFT (LoRA) → DPO preference alignment → merged_16bit export

## What is included

✅ **Full merged 16-bit weights** (no adapter loading required)  
✅ Tokenizer files  
✅ Model card (this README)

## Training Pipeline

### Step 1 — SFT Initialization (LoRA)
Before DPO, the base model was initialized with an SFT LoRA adapter to improve **structured output behavior** (JSON / YAML / XML / TOML / CSV style formatting).

SFT initialization adapter:
`MSakae/qwen3-4b-structured-output-lora_sample_try_L4`

### Step 2 — DPO Alignment
The SFT-initialized model was further optimized using DPO with a preference dataset.

DPO dataset:
`u-10bei/dpo-dataset-qwen-cot`

DPO aims to improve:
- Preference alignment between chosen vs rejected responses
- Response consistency and selection quality
- Structured response quality under preference constraint

## Training Configuration
- **Base model**: Qwen/Qwen3-4B-Instruct-2507
- **Method**: DPO (Direct Preference Optimization)
- **Epochs**: 1
- **Learning rate**: 1e-06
- **Beta**: 0.1
- **Max sequence length**: 1024

## Usage
Since this is a merged model, you can use it directly with `transformers`.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "your_id/your-repo-name"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Test inference
prompt = "Your question here"
inputs = tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

```

## Sources & License (IMPORTANT)
* **Preference dataset**: u-10bei/dpo-dataset-qwen-cot
* **Base model terms**: Users must comply with the original base model’s license/terms of use.
* **Dataset terms**: Users must comply with the dataset license/terms (including any required notices).