Instructions to use adithash/gemma2b-dolly-qlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use adithash/gemma2b-dolly-qlora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it") model = PeftModel.from_pretrained(base_model, "adithash/gemma2b-dolly-qlora") - Notebooks
- Google Colab
- Kaggle
gemma2b-dolly-qlora
A QLoRA fine-tuned version of google/gemma-2b-it on the databricks/databricks-dolly-15k instruction-following dataset.
Fine-tuned using QLoRA (Quantized Low-Rank Adaptation) β the base model is frozen in 4-bit precision and only the LoRA adapter weights (~13M params out of 2B) are trained. This makes fine-tuning possible on a single free-tier T4 GPU.
Merged full model also available:
adithash/gemma2b-dolly-qlora-mergedβ base + adapter fused into a single standalone model.
Model Details
| Property | Value |
|---|---|
| Base model | google/gemma-2b-it |
| Fine-tuning method | QLoRA (4-bit NF4 quantized base + LoRA adapters) |
| Dataset | databricks/databricks-dolly-15k |
| Training samples | 14,911 |
| Training steps | 500 (capped for free Colab T4) |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Learning rate | 2e-4 |
| Batch size | 2 (effective: 8 with grad accum Γ4) |
| Sequence length | 256 tokens |
| Quantization | 4-bit NF4, compute dtype bfloat16 |
| Hardware | Google Colab T4 (16 GB VRAM) |
| Training time | ~2.5 hours |
| Adapter size | ~50 MB |
| Framework | transformers + peft + trl (SFTTrainer) |
Training Loss
Loss dropped significantly in early steps and continued to converge steadily:
| Step | Training Loss |
|---|---|
| 25 | 3.60 |
| 50 | 2.65 |
| 75 | 2.32 |
| 100 | 2.22 |
Prompt Format
This model uses the Gemma chat template format. Always wrap your inputs correctly:
<start_of_turn>user
Your instruction here<end_of_turn>
<start_of_turn>model
If your prompt includes context (e.g. a passage to summarise), append it to the instruction:
<start_of_turn>user
Summarise the following text.
Context: <your context here><end_of_turn>
<start_of_turn>model
How to Use
Option A β LoRA Adapter (Recommended)
Lightweight (~50 MB). Loads the frozen base model and attaches the adapter on top.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2b-it",
quantization_config=bnb_config,
device_map="auto",
)
# Attach LoRA adapter
model = PeftModel.from_pretrained(base_model, "adithash/gemma2b-dolly-qlora")
tokenizer = AutoTokenizer.from_pretrained("adithash/gemma2b-dolly-qlora")
def chat(instruction, context="", max_new_tokens=200, temperature=0.7):
user_msg = f"{instruction}\n\nContext: {context}" if context.strip() else instruction
prompt = (
f"<start_of_turn>user\n{user_msg}<end_of_turn>\n"
f"<start_of_turn>model\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=temperature,
top_p=0.9,
)
return tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
# Example
print(chat("Explain what overfitting is in machine learning and how to prevent it."))
Option B β Merged Model (Standalone)
Full model with adapter baked in. No need for the base model separately. Larger (~3 GB) but simpler to load.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"adithash/gemma2b-dolly-qlora-merged",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("adithash/gemma2b-dolly-qlora-merged")
def chat(instruction, max_new_tokens=200, temperature=0.7):
prompt = (
f"<start_of_turn>user\n{instruction}<end_of_turn>\n"
f"<start_of_turn>model\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=temperature,
top_p=0.9,
)
return tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
print(chat("What is the difference between SQL and NoSQL databases?"))
Repository Structure
adithash/gemma2b-dolly-qlora/
βββ adapter_config.json # LoRA config (rank, alpha, target modules)
βββ adapter_model.safetensors # Trained LoRA weights (~50 MB)
βββ tokenizer.json
βββ tokenizer_config.json
βββ README.md
Intended Use
- β Learning and experimentation with QLoRA fine-tuning
- β Portfolio demonstration of end-to-end fine-tuning pipeline
- β Starting point for domain-specific instruction tuning
- β Not intended for production or commercial use
- β Not suitable for safety-critical applications
Limitations
- Fine-tuned for only 500 steps as a proof-of-concept on free Colab T4 β a full epoch would be ~7,400 steps
- Gemma 2B is a small model β complex multi-step reasoning will be limited
- Training sequence length capped at 256 tokens β very long prompts will be truncated
- A newer base model exists:
google/gemma-2-2b-it - Subject to Google's Gemma Terms of Use
Training Infrastructure
| Component | Detail |
|---|---|
| Notebook | Google Colab (free tier) |
| GPU | NVIDIA T4 β 16 GB VRAM |
| Libraries | transformers 4.x, peft, trl 1.3+, bitsandbytes, accelerate |
| Trainer | SFTTrainer with SFTConfig |
| Gradient checkpointing | Enabled |
| Mixed precision | bfloat16 |
Citation
If you use this model or adapter in your work, please credit the base model:
@article{gemma_2024,
title = {Gemma: Open Models Based on Gemini Research and Technology},
author = {Gemma Team, Google DeepMind},
year = {2024},
url = {https://arxiv.org/abs/2403.08295}
}
Author
Aditya Dey β ML Engineer
π€ HuggingFace Β· GitHub
- Downloads last month
- 29
Model tree for adithash/gemma2b-dolly-qlora
Base model
google/gemma-2b-it
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it") model = PeftModel.from_pretrained(base_model, "adithash/gemma2b-dolly-qlora")