--- language: - en license: apache-2.0 tags: - llama - llama-3.2 - lora - peft - unsloth - fine-tuning - generative-ai - conversational-ai - personas - india - 4-bit - instruction-tuning datasets: - NVIDIA/Nemotron-Personas base_model: unsloth/Llama-3.2-3B-bnb-4bit model-index: - name: llama-Nemotron-Personas-India-finetuned results: [] --- # LLaMA 3.2 3B — Nemotron Personas India Fine-tuned A parameter-efficient fine-tuned version of LLaMA 3.2 3B, trained using the NVIDIA Nemotron Personas dataset with Indian contextual and conversational style tuning. This model is built using Unsloth for faster and memory-efficient LoRA fine-tuning. --- # Model Overview This model is designed to generate: - More natural conversational responses - Indian contextual understanding - Persona-aware dialogue - Human-like assistant interactions - Better response personalization The model is fine-tuned using LoRA adapters on top of: `unsloth/Llama-3.2-3B-bnb-4bit` --- # Quick Start ## Install Requirements ```bash pip install unsloth pip install torch transformers peft accelerate bitsandbytes ``` --- # Load Model ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "Sachin016/llama-Nemotron-Personas-India-finetuned", max_seq_length = 2048, dtype = None, load_in_4bit = True, ) FastLanguageModel.for_inference(model) ``` --- # Run Inference ```python alpaca_prompt = """### Instruction: {} ### Input: {} ### Response: {}""" question = "Explain AI in simple words" inputs = tokenizer( [alpaca_prompt.format(question, "", "")], return_tensors = "pt" ).to("cuda") outputs = model.generate( **inputs, max_new_tokens = 128, use_cache = True ) response = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip() print(response) ``` --- # Model Details | Property | Value | |---|---| | Base Model | unsloth/Llama-3.2-3B-bnb-4bit | | Fine-tuning Method | LoRA | | Quantization | 4-bit | | Framework | Unsloth + HuggingFace PEFT | | Max Sequence Length | 2048 | | Language | English | | Training Type | Instruction Fine-tuning | | Domain | Conversational AI | | Specialization | Persona-based responses | --- # Dataset This model was fine-tuned using: - NVIDIA Nemotron Personas Dataset The dataset helps improve: - Conversational quality - Personality consistency - Human-like dialogue generation - Contextual response handling --- # Training Configuration ```python TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 4, warmup_steps = 5, num_train_epochs = 3, learning_rate = 2e-4, optim = "adamw_8bit", ) ``` --- # Prompt Format This model uses the Alpaca instruction format. ```text ### Instruction: ### Input: ### Response: ``` --- # Upload to Hugging Face ```python from huggingface_hub import login login(token="YOUR_HUGGINGFACE_TOKEN") from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name = "/content/drive/MyDrive/llama_project/finetuned_model", max_seq_length = 2048, dtype = None, load_in_4bit = True, ) model.push_to_hub( "Sachin016/llama-Nemotron-Personas-India-finetuned", token="YOUR_HUGGINGFACE_TOKEN" ) tokenizer.push_to_hub( "Sachin016/llama-Nemotron-Personas-India-finetuned", token="YOUR_HUGGINGFACE_TOKEN" ) ``` --- # Recommended Hardware Recommended GPUs: - NVIDIA T4 - NVIDIA A100 - RTX 3060+ - Any CUDA GPU with 8GB+ VRAM Can be trained easily on Google Colab using Unsloth. --- # Limitations - May generate hallucinated information - Performance depends on prompt quality - Not intended for medical/legal critical systems - Knowledge limited to base model training cutoff --- # License This project is released under the Apache 2.0 License. The base model follows Meta's LLaMA 3.2 Community License. --- # Acknowledgements - Unsloth — optimized LLM fine-tuning - Meta AI — LLaMA 3.2 base model - NVIDIA — Nemotron Personas dataset - Hugging Face — model hosting platform --- # Author Made with ❤️ by Sachin016 Hugging Face: https://huggingface.co/Sachin016