--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/structured_data_with_cot_dataset_512_v2 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - qlora - lora - structured-output --- # qwen3-4b-structured-output-lora-v5.5C This repository provides a **LoRA adapter (v5.5C)** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using **QLoRA (4-bit, Unsloth)**. This repository contains **LoRA adapter weights only**. The base model must be loaded separately. ## Version: v5.5C — Hyperparameter Tuning This is v5.5C of the SFT training, focusing on **hyperparameter tuning** to address overfitting in v5. v5 achieved 0.73981, which was lower than v2 (0.75074) due to Epoch=2 causing overfitting. ### Changes from v5 | Parameter | v5 | v5.5C | Rationale | |---|---|---|---| | Dataset | 3,869 samples | **3,869 samples** | Same (XML errors removed) | | MAX_SEQ_LEN | 1024 | **1024** | Same | | Epochs | 2 | **1** | Reduced to prevent overfitting | | Learning Rate | 5e-6 | **1e-06** | Lower for more stable training | | Warmup Ratio | 10% | **10%** | Increased to reduce early instability | ### v5.5C Key Improvements 1. **Epoch=1**: v5's Epoch=2 caused overfitting (train/loss increased at end) 2. **Learning Rate 5e-6**: More conservative learning to prevent overfitting 3. **Warmup Ratio 10%**: Longer warmup for training stability 4. **Data unchanged**: 3,869 samples with XML errors removed ### Score History | Version | Data | Score | Notes | |---|---|---|---| | v2 | 3,933 | **0.75074** | Best score baseline | | v5 | 3,869 | 0.73981 | Epoch=2 overfitting | | v5.5C | 3,869 | (pending) | Hyperparam tuning | ## Training Objective This adapter is trained to improve **structured output accuracy** (JSON / YAML / XML / TOML / CSV) for the StructEval-T benchmark. Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked. ## Training Configuration - Base model: Qwen/Qwen3-4B-Instruct-2507 - Method: QLoRA (4-bit, Unsloth) - Max sequence length: 1024 - Epochs: 1 - Learning rate: 1e-06 - Warmup ratio: 10% - Batch size: 2 (effective: 16) - Gradient accumulation: 8 - LoRA: r=64, alpha=128 - CoT masking: enabled (loss on final output only) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen3-4B-Instruct-2507" adapter = "your_id/qwen3-4b-structured-output-lora-v5.5C" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter) ``` ## Sources & Terms (IMPORTANT) Training data: u-10bei/structured_data_with_cot_dataset_512_v2 Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.