Instructions to use Sachin016/llama-Nemotron-Personas-India-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Sachin016/llama-Nemotron-Personas-India-finetuned with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-3B-bnb-4bit") model = PeftModel.from_pretrained(base_model, "Sachin016/llama-Nemotron-Personas-India-finetuned") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use Sachin016/llama-Nemotron-Personas-India-finetuned with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sachin016/llama-Nemotron-Personas-India-finetuned to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sachin016/llama-Nemotron-Personas-India-finetuned to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Sachin016/llama-Nemotron-Personas-India-finetuned to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Sachin016/llama-Nemotron-Personas-India-finetuned", max_seq_length=2048, )
LLaMA 3.2 3B โ Nemotron Personas India Fine-tuned
A parameter-efficient fine-tuned version of LLaMA 3.2 3B, trained using the NVIDIA Nemotron Personas dataset with Indian contextual and conversational style tuning.
This model is built using Unsloth for faster and memory-efficient LoRA fine-tuning.
Model Overview
This model is designed to generate:
- More natural conversational responses
- Indian contextual understanding
- Persona-aware dialogue
- Human-like assistant interactions
- Better response personalization
The model is fine-tuned using LoRA adapters on top of:
unsloth/Llama-3.2-3B-bnb-4bit
Quick Start
Install Requirements
pip install unsloth
pip install torch transformers peft accelerate bitsandbytes
Load Model
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "Sachin016/llama-Nemotron-Personas-India-finetuned",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model)
Run Inference
alpaca_prompt = """### Instruction:
{}
### Input:
{}
### Response:
{}"""
question = "Explain AI in simple words"
inputs = tokenizer(
[alpaca_prompt.format(question, "", "")],
return_tensors = "pt"
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens = 128,
use_cache = True
)
response = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip()
print(response)
Model Details
| Property | Value |
|---|---|
| Base Model | unsloth/Llama-3.2-3B-bnb-4bit |
| Fine-tuning Method | LoRA |
| Quantization | 4-bit |
| Framework | Unsloth + HuggingFace PEFT |
| Max Sequence Length | 2048 |
| Language | English |
| Training Type | Instruction Fine-tuning |
| Domain | Conversational AI |
| Specialization | Persona-based responses |
Dataset
This model was fine-tuned using:
- NVIDIA Nemotron Personas Dataset
The dataset helps improve:
- Conversational quality
- Personality consistency
- Human-like dialogue generation
- Contextual response handling
Training Configuration
TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
num_train_epochs = 3,
learning_rate = 2e-4,
optim = "adamw_8bit",
)
Prompt Format
This model uses the Alpaca instruction format.
### Instruction:
<your instruction>
### Input:
<optional context>
### Response:
<model output>
Upload to Hugging Face
from huggingface_hub import login
login(token="YOUR_HUGGINGFACE_TOKEN")
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "/content/drive/MyDrive/llama_project/finetuned_model",
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
model.push_to_hub(
"Sachin016/llama-Nemotron-Personas-India-finetuned",
token="YOUR_HUGGINGFACE_TOKEN"
)
tokenizer.push_to_hub(
"Sachin016/llama-Nemotron-Personas-India-finetuned",
token="YOUR_HUGGINGFACE_TOKEN"
)
Recommended Hardware
Recommended GPUs:
- NVIDIA T4
- NVIDIA A100
- RTX 3060+
- Any CUDA GPU with 8GB+ VRAM
Can be trained easily on Google Colab using Unsloth.
Limitations
- May generate hallucinated information
- Performance depends on prompt quality
- Not intended for medical/legal critical systems
- Knowledge limited to base model training cutoff
License
This project is released under the Apache 2.0 License.
The base model follows Meta's LLaMA 3.2 Community License.
Acknowledgements
- Unsloth โ optimized LLM fine-tuning
- Meta AI โ LLaMA 3.2 base model
- NVIDIA โ Nemotron Personas dataset
- Hugging Face โ model hosting platform
Author
Made with โค๏ธ by Sachin016
Hugging Face: https://huggingface.co/Sachin016
- Downloads last month
- 1