LLaMA 3.2 3B — Nemotron Personas India Fine-tuned

A parameter-efficient fine-tuned version of LLaMA 3.2 3B, trained using the NVIDIA Nemotron Personas dataset with Indian contextual and conversational style tuning.

This model is built using Unsloth for faster and memory-efficient LoRA fine-tuning.

Model Overview

This model is designed to generate:

More natural conversational responses
Indian contextual understanding
Persona-aware dialogue
Human-like assistant interactions
Better response personalization

The model is fine-tuned using LoRA adapters on top of:

unsloth/Llama-3.2-3B-bnb-4bit

Quick Start

Install Requirements

pip install unsloth
pip install torch transformers peft accelerate bitsandbytes

Load Model

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Sachin016/llama-Nemotron-Personas-India-finetuned",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

FastLanguageModel.for_inference(model)

Run Inference

alpaca_prompt = """### Instruction:
{}

### Input:
{}

### Response:
{}"""

question = "Explain AI in simple words"

inputs = tokenizer(
    [alpaca_prompt.format(question, "", "")],
    return_tensors = "pt"
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens = 128,
    use_cache = True
)

response = tokenizer.batch_decode(outputs)[0].split("### Response:")[1].strip()

print(response)

Model Details

Property	Value
Base Model	unsloth/Llama-3.2-3B-bnb-4bit
Fine-tuning Method	LoRA
Quantization	4-bit
Framework	Unsloth + HuggingFace PEFT
Max Sequence Length	2048
Language	English
Training Type	Instruction Fine-tuning
Domain	Conversational AI
Specialization	Persona-based responses

Dataset

This model was fine-tuned using:

NVIDIA Nemotron Personas Dataset

The dataset helps improve:

Conversational quality
Personality consistency
Human-like dialogue generation
Contextual response handling

Training Configuration

TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 5,
    num_train_epochs = 3,
    learning_rate = 2e-4,
    optim = "adamw_8bit",
)

Prompt Format

This model uses the Alpaca instruction format.

### Instruction:
<your instruction>

### Input:
<optional context>

### Response:
<model output>

Upload to Hugging Face

from huggingface_hub import login

login(token="YOUR_HUGGINGFACE_TOKEN")

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/content/drive/MyDrive/llama_project/finetuned_model",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

model.push_to_hub(
    "Sachin016/llama-Nemotron-Personas-India-finetuned",
    token="YOUR_HUGGINGFACE_TOKEN"
)

tokenizer.push_to_hub(
    "Sachin016/llama-Nemotron-Personas-India-finetuned",
    token="YOUR_HUGGINGFACE_TOKEN"
)

Recommended Hardware

Recommended GPUs:

NVIDIA T4
NVIDIA A100
RTX 3060+
Any CUDA GPU with 8GB+ VRAM

Can be trained easily on Google Colab using Unsloth.

Limitations

May generate hallucinated information
Performance depends on prompt quality
Not intended for medical/legal critical systems
Knowledge limited to base model training cutoff

License

This project is released under the Apache 2.0 License.

The base model follows Meta's LLaMA 3.2 Community License.

Acknowledgements

Unsloth — optimized LLM fine-tuning
Meta AI — LLaMA 3.2 base model
NVIDIA — Nemotron Personas dataset
Hugging Face — model hosting platform

Author

Made with ❤️ by Sachin016

Hugging Face: https://huggingface.co/Sachin016

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sachin016/llama-Nemotron-Personas-India-finetuned

Base model

meta-llama/Llama-3.2-3B

Quantized

unsloth/Llama-3.2-3B-bnb-4bit

Adapter

(9)

this model