--- language: - en license: apache-2.0 tags: - llama - llama-3.2 - lora - peft - unsloth - question-answering - wikiqa - fine-tuned - 4-bit datasets: - microsoft/wiki_qa base_model: unsloth/Llama-3.2-3B-bnb-4bit model-index: - name: llama-wikiqa-finetuned results: [] --- # LLaMA 3.2 3B — WikiQA Fine-tuned A **parameter-efficient fine-tuned** version of LLaMA 3.2 3B, trained on the [WikiQA](https://huggingface.co/datasets/microsoft/wiki_qa) dataset for open-domain question answering. Built using [Unsloth](https://github.com/unslothai/unsloth) for 2× faster training with LoRA adapters. --- ## Quick Start ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "bnpatel01/llama-wikiqa-finetuned", max_seq_length = 2048, dtype = None, load_in_4bit = True, ) FastLanguageModel.for_inference(model) ``` ### Run Inference ```python alpaca_prompt = """### Instruction: {} ### Input: {} ### Response: {}""" question = "What is the capital of France?" inputs = tokenizer( [alpaca_prompt.format(question, "", "")], return_tensors="pt" ).to("cuda") outputs = model.generate(**inputs, max_new_tokens=128, use_cache=True) answer = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip() print(answer) ``` --- ## Model Details | Property | Value | |---|---| | **Base Model** | unsloth/Llama-3.2-3B-bnb-4bit | | **Fine-tune Method** | LoRA (Low-Rank Adaptation) | | **LoRA Rank** | 16 | | **LoRA Alpha** | 16 | | **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Quantization** | 4-bit (load_in_4bit=True) | | **Max Seq Length** | 2048 tokens | | **Adapter Size** | ~92.8 MB | | **Framework** | Unsloth + HuggingFace PEFT | | **Language** | English | | **Task** | Open-Domain Question Answering | --- ## Dataset Trained on the [microsoft/wiki_qa](https://huggingface.co/datasets/microsoft/wiki_qa) dataset — a benchmark for open-domain QA using Wikipedia passages. | Split | Samples (after label=1 filter) | |---|---| | Train | 6,165 | | Validation | 2,733 | | Test | 20,360 | Only samples with `label == 1` (correct answer–question pairs) were used for training. --- ## Training Configuration ```python TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, num_train_epochs = 3, learning_rate = 2e-4, optim = "adamw_8bit", ) ``` - **Epochs:** 3 - **Optimizer:** AdamW 8-bit - **Precision:** bf16 (if supported), else fp16 - **Gradient checkpointing:** Unsloth optimized --- ## Prompt Format This model uses the **Alpaca instruction format**: ``` ### Instruction: ### Input: ### Response: ``` --- ## Requirements ```bash pip install unsloth pip install torch transformers peft ``` Recommended: **Google Colab** with T4/A100 GPU or any CUDA-capable GPU with 8GB+ VRAM. --- ## Limitations - Trained only on WikiQA — best suited for factoid, Wikipedia-style questions - May not perform well on complex reasoning or multi-hop questions - Knowledge is limited to the base LLaMA 3.2 training data cutoff - Responses may occasionally be incorrect or hallucinated --- ## License This model is released under the **Apache 2.0** license. The base model follows Meta's [LLaMA 3.2 Community License](https://www.llama.com/llama3_2/license/). --- ## Acknowledgements - [Unsloth](https://github.com/unslothai/unsloth) — for making fine-tuning 2× faster - [Meta AI](https://ai.meta.com/) — for the LLaMA 3.2 base model - [Microsoft Research](https://www.microsoft.com/en-us/research/) — for the WikiQA dataset --- *Made with ❤️ by [bnpatel01](https://huggingface.co/bnpatel01)*