How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16 to start chatting
Load model with FastModel
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16",
    max_seq_length=2048,
)
Quick Links

unsloth/Llama-3.2-3B-Instruct-bnb-4bit Fine-tuned with GRIT and QLoRA (Unsloth)

This model is a fine-tuned version of unsloth/Llama-3.2-3B-Instruct-bnb-4bit using the GRIT (Geometric Reprojection Instruction Tuning) algorithm and QLoRA on the tatsu-lab/alpaca dataset.

The base model is quantized to 4-bit (NF4) and optimized with Unsloth to enable efficient fine-tuning.

πŸš€ Training Details

GRIT Algorithm

  • K-FAC Updates: Every 100 steps (adaptive) for second-order preconditioning.
  • Neural Reprojection: Every 100 steps (adaptive) for rank optimization.
  • Rank Adaptation: Enabled (Threshold: 0.99, Min Rank: 4).
  • Optimized LoRA Modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj']

Fine-tuning Configuration

  • Base Model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit
  • Quantization: 4-bit (NF4) with fp16 compute.
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Batch Size: 8 (per device)
  • Gradient Accumulation: 4 (Effective batch = 32)
  • Learning Rate: 2.0e-05
  • Precision: fp16 mixed precision
  • Sequence Length: 512 tokens
  • Gradient Checkpointing: Enabled

Performance Improvements

  • βœ… Faster Convergence: K-FAC preconditioning aligns updates with curvature.
  • βœ… Memory-Efficient: 4-bit quantization (QLoRA) and gradient checkpointing used.
  • βœ… Unsloth-Optimized: Leverages Unsloth for significant speedups and memory savings.
  • βœ… Adaptive Rank: Dynamically prunes LoRA rank to improve parameter efficiency.

πŸ“Š Training Metrics

  • Total Steps: 732
  • Final Loss: 6.615167419767119
  • BLEU (val): None
  • Trainable Params: 2,621,440

πŸ“ Algorithm Details

  • K-FAC Preconditioning (Natural Gradient) and Neural Reprojection as per GRIT method.
  • Memory Efficient: Covariance matrices on CPU to reduce GPU load.

πŸ† Results

In benchmark comparisons, GRIT has shown faster convergence and better stability than standard LoRA or fine-tuning, making it well-suited for efficient single-epoch training. The use of Unsloth further accelerates this process.

πŸ“ Citation

If you use this model, please cite the original GRIT paper and:

@misc{grit-lora-Llama-3.2-3B-Instruct-bnb-4bit-alpaca},
  title={ unsloth/Llama-3.2-3B-Instruct-bnb-4bit Fine-tuned with GRIT on tatsu-lab/alpaca },
  author={D1zzYzz},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16}
}

βš–οΈ License

This model inherits the Apache 2.0 license.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16

Dataset used to train D1zzYzz/GRIT-Full-llama-3.2-3B-alpaca-r16