---
language:
- en
license: apache-2.0
tags:
- llama
- llama-3.2
- lora
- peft
- unsloth
- question-answering
- wikiqa
- fine-tuned
- 4-bit
datasets:
- microsoft/wiki_qa
base_model: unsloth/Llama-3.2-3B-bnb-4bit
model-index:
- name: llama-wikiqa-finetuned
  results: []
---

# LLaMA 3.2 3B — WikiQA Fine-tuned

A **parameter-efficient fine-tuned** version of LLaMA 3.2 3B, trained on the [WikiQA](https://huggingface.co/datasets/microsoft/wiki_qa) dataset for open-domain question answering. Built using [Unsloth](https://github.com/unslothai/unsloth) for 2× faster training with LoRA adapters.

---

## Quick Start

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "bnpatel01/llama-wikiqa-finetuned",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
```

### Run Inference

```python
alpaca_prompt = """### Instruction:
{}

### Input:
{}

### Response:
{}"""

question = "What is the capital of France?"

inputs = tokenizer(
    [alpaca_prompt.format(question, "", "")],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True)
answer = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip()
print(answer)
```

---

## Model Details

| Property | Value |
|---|---|
| **Base Model** | unsloth/Llama-3.2-3B-bnb-4bit |
| **Fine-tune Method** | LoRA (Low-Rank Adaptation) |
| **LoRA Rank** | 16 |
| **LoRA Alpha** | 16 |
| **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| **Quantization** | 4-bit (load_in_4bit=True) |
| **Max Seq Length** | 2048 tokens |
| **Adapter Size** | ~92.8 MB |
| **Framework** | Unsloth + HuggingFace PEFT |
| **Language** | English |
| **Task** | Open-Domain Question Answering |

---

## Dataset

Trained on the [microsoft/wiki_qa](https://huggingface.co/datasets/microsoft/wiki_qa) dataset — a benchmark for open-domain QA using Wikipedia passages.

| Split | Samples (after label=1 filter) |
|---|---|
| Train | 6,165 |
| Validation | 2,733 |
| Test | 20,360 |

Only samples with `label == 1` (correct answer–question pairs) were used for training.

---

## Training Configuration

```python
TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 5,
    num_train_epochs = 3,
    learning_rate = 2e-4,
    optim = "adamw_8bit",
)
```

- **Epochs:** 3
- **Optimizer:** AdamW 8-bit
- **Precision:** bf16 (if supported), else fp16
- **Gradient checkpointing:** Unsloth optimized

---

## Prompt Format

This model uses the **Alpaca instruction format**:

```
### Instruction:
<your question here>

### Input:
<optional context, leave empty for QA>

### Response:
<model answer>
```

---

## Requirements

```bash
pip install unsloth
pip install torch transformers peft
```

Recommended: **Google Colab** with T4/A100 GPU or any CUDA-capable GPU with 8GB+ VRAM.

---

## Limitations

- Trained only on WikiQA — best suited for factoid, Wikipedia-style questions
- May not perform well on complex reasoning or multi-hop questions
- Knowledge is limited to the base LLaMA 3.2 training data cutoff
- Responses may occasionally be incorrect or hallucinated

---

## License

This model is released under the **Apache 2.0** license. The base model follows Meta's [LLaMA 3.2 Community License](https://www.llama.com/llama3_2/license/).

---

## Acknowledgements

- [Unsloth](https://github.com/unslothai/unsloth) — for making fine-tuning 2× faster
- [Meta AI](https://ai.meta.com/) — for the LLaMA 3.2 base model
- [Microsoft Research](https://www.microsoft.com/en-us/research/) — for the WikiQA dataset

---

*Made with ❤️ by [bnpatel01](https://huggingface.co/bnpatel01)*