rahmasaber
/

qwen2.5-iq-Finetuning-qlora

 ---
+library_name: peft
+license: apache-2.0
+base_model: Qwen/Qwen2.5-1.5B-Instruct
+tags:
+  - qlora
+  - lora
+  - fine-tuning
+  - reasoning
+  - qwen2.5
+  - openthoughts
+  - 4-bit
+  - nf4
+datasets:
+  - open-thoughts/OpenThoughts-114k
+language:
+  - en
+pipeline_tag: text-generation
+model-index:
+  - name: qwen2.5-iq-Finetuning-qlora
+    results: []
 ---
+# Qwen2.5-1.5B-Instruct — QLoRA Fine-Tuned on OpenThoughts-114k
+A QLoRA adapter for [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), fine-tuned on curated reasoning traces from [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) to produce clean, structured, step-by-step solutions.
+## Key Details
+| | |
+|---|---|
+| **Base Model** | Qwen/Qwen2.5-1.5B-Instruct |
+| **Method** | QLoRA (4-bit NF4 + LoRA) |
+| **Dataset** | 30K samples from OpenThoughts-114k |
+| **Hardware** | Single NVIDIA T4 (16GB VRAM, free Colab) |
+| **Adapter Size** | ~50MB |
+| **Trainable Params** | ~1.5% of total model parameters |
+## What This Adapter Does
+The base Qwen2.5-1.5B-Instruct model produces reasonable answers but tends to be verbose and sometimes loses structure in multi-step reasoning. This adapter improves:
+- **Response conciseness** — ~12% shorter outputs on average, cutting fluff while retaining substance
+- **Step-by-step structure** — cleaner formatting with numbered steps and proper LaTeX math notation
+- **Reasoning accuracy** — correct answers on trick questions and logic puzzles where the base model fumbles
+## Training Details
+### Quantization
+```
+BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+)
+```
+### LoRA Configuration
+```
+LoraConfig(
+    r=32,
+    lora_alpha=64,
+    lora_dropout=0.05,
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
+                     "gate_proj", "up_proj", "down_proj"],
+    bias="none",
+    task_type="CAUSAL_LM",
+)
+```
+### Training Hyperparameters
+| Parameter | Value |
+|---|---|
+| Epochs | 1 |
+| Batch size | 1 (× 4 gradient accumulation) |
+| Learning rate | 2e-4 |
+| Scheduler | Cosine with 50-step warmup |
+| Optimizer | Paged AdamW 8-bit |
+| Max sequence length | 2048 |
+| NEFTune noise alpha | 5 |
+| Precision | fp16 |
+### Data Preprocessing — The Critical Step
+The OpenThoughts-114k dataset contains DeepSeek-R1 reasoning traces with two sections:
+- `<begin_of_thought>` — thousands of tokens of raw internal reasoning
+- `<begin_of_solution>` — the clean, structured final answer
+**We train only on the extracted solution block.** Training on the full traces causes the model to produce rambling, unfocused output. Extracting only the solution with a simple regex produced dramatically better results — same model, same hyperparameters, completely different output quality.
+```python
+import re
+def formatting_func(example):
+    role_map = {"human": "user", "gpt": "assistant"}
+    messages = []
+    if example.get("system"):
+        messages.append({"role": "system", "content": example["system"]})
+    for turn in example["conversations"]:
+        role = role_map.get(turn["from"], turn["from"])
+        content = turn["value"]
+        # Extract only the final solution
+        if role == "assistant":
+            match = re.search(
+                r"<\|begin_of_solution\|>(.*?)<\|end_of_solution\|>",
+                content, re.DOTALL,
+            )
+            if match:
+                content = match.group(1).strip()
+        messages.append({"role": role, "content": content})
+    return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
+```
+### Response Masking
+Labels are padded with `-100` on all non-assistant tokens using `DataCollatorForSeq2Seq`, so the cross-entropy loss is only computed on the tokens the model needs to generate at inference time. This improves sample efficiency — every gradient update is focused on useful generation.
+## Usage
+### Load with PEFT
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+from peft import PeftModel
+import torch
+# Load base model in 4-bit
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+)
+base_model = AutoModelForCausalLM.from_pretrained(
+    "Qwen/Qwen2.5-1.5B-Instruct",
+    quantization_config=bnb_config,
+    device_map="auto",
+    trust_remote_code=True,
+)
+# Load adapter
+model = PeftModel.from_pretrained(base_model, "rahmasaber/qwen2.5-iq-Finetuning-qlora")
+tokenizer = AutoTokenizer.from_pretrained("rahmasaber/qwen2.5-iq-Finetuning-qlora")
+model.eval()
+```
+### Generate
+```python
+messages = [
+    {"role": "system", "content": "You are a helpful assistant that thinks step-by-step."},
+    {"role": "user", "content": "If 5 machines produce 5 widgets in 5 minutes, how many minutes for 100 machines to produce 100 widgets?"},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    output = model.generate(
+        **inputs,
+        max_new_tokens=512,
+        temperature=0.7,
+        top_p=0.9,
+        do_sample=True,
+    )
+response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+### Compare Base vs Fine-Tuned
+```python
+# Disable adapter → base model behavior
+model.disable_adapter_layers()
+base_response = generate(prompt)
+# Enable adapter → fine-tuned behavior
+model.enable_adapter_layers()
+ft_response = generate(prompt)
+```
+## Evaluation
+Tested on 10 handcrafted reasoning prompts across 5 categories:
+| Category | # Prompts | What it tests |
+|---|---|---|
+| Logic Puzzles | 2 | Trick questions, careful reading |
+| Math | 3 | Word problems, sequential operations |
+| Reasoning | 2 | Formal logic, deductive puzzles |
+| Code | 1 | Algorithm complexity analysis |
+| Science | 2 | Physics principles, Archimedes |
+### Results vs Base Model
+| Metric | Base | Fine-Tuned |
+|---|---|---|
+| Avg response length (tokens) | 314 | 275 (-12%) |
+| Correct on "all but 9 sheep" | ✅ | ✅ |
+| Correct on average speed (harmonic mean) | ✅ | ✅ |
+| Correct on discount stacking (32%) | ✅ | ✅ |
+| Correct on 5 machines/5 widgets | ❌ | ✅ |
+| Structured step-by-step format | Sometimes | Consistently |
+### Held-Out Test Set
+200 examples held out from the training sample for overfitting detection. Train/test loss gap remained healthy (< 0.5), confirming the model generalizes rather than memorizing.
+## Limitations
+- **Small base model** — 1.5B parameters limits complex multi-hop reasoning
+- **1 epoch on 1.2K-3K samples** — more data and epochs would improve accuracy
+- **Self-evaluation bias** — LLM-as-judge uses the same model family; use a stronger external model (GPT-4, Claude) for rigorous evaluation
+- **Science questions** — the fine-tuned model occasionally gets physics wrong (e.g., feather vs bowling ball on Moon)
+- **No benchmark scores** — not evaluated on GSM8K, MATH, or HumanEval yet
+## Files
+```
+.
+├── adapter_config.json        # LoRA configuration
+├── adapter_model.safetensors  # LoRA weights (~50MB)
+├── tokenizer_config.json      # Tokenizer settings
+├── tokenizer.json             # Tokenizer vocabulary
+├── special_tokens_map.json    # Special token mappings
+└── README.md                  # This file
+```
+## Citation
+```bibtex
+@misc{saber2026qwen25qlora,
+  title={QLoRA Fine-Tuning Qwen2.5-1.5B-Instruct on OpenThoughts-114k},
+  author={Rahma Saber},
+  year={2026},
+  url={https://huggingface.co/rahmasaber/qwen2.5-iq-Finetuning-qlora}
+}
+```
+## Acknowledgments
+- [Qwen Team](https://huggingface.co/Qwen) for the base model
+- [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) for the reasoning dataset
+- [Hugging Face](https://huggingface.co/) for PEFT, TRL, and the Hub
+- [Google Colab](https://colab.research.google.com/) for free GPU access