Gemma 3n 4B Distill SmolLM2 360M Instruct

WARNING: REMEMBER TO ADD CUSTOM SYSTEM PROMPT RESEMBLING GEMMA 3N 4B IF YOU WANT MODEL TO KNOW THAT IT'S IT, BECAUSE THE ONLY THING CHANGED IS STYLE, IN DATASET THERE WERE NO SIGNS OF TEACHER MODEL. HAVE FUN.

This model is a fine-tuned version of unsloth/SmolLM2-360M-Instruct on the sapbot/gemma-3n-4b-it-423x dataset.

Model description

This model was distilled from Gemma 3n 4B, essentially as a demonstration of purpose of my datasets.

Intended uses & limitations

As a demonstration of my datasets

Limitations are easy: it's still SmolLM2, just with... a little bit of google taste.

Training and evaluation data

Loss Graph

Training procedure

Used default parameters from LLaMA-Factory.

P.S. Waiting for Unsloth Studio official ROCm support, LLaMA-Factory was pain to use as inexpirienced user in that field.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • num_epochs: 3.0
  • mixed_precision_training: Native AMP

Device

Trained entirely on one AMD ATI Radeon RX 6600 just for 30min. It was NOT QLoRA, but just a LoRA! What to say, really fast!

Framework versions

  • PEFT 0.18.1
  • Transformers 5.2.0
  • Pytorch 2.6.0+rocm6.1
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
55
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sapbot/gemma-3n-4b-it-distill-smollm2-360m

Adapter
(157)
this model
Quantizations
1 model

Dataset used to train sapbot/gemma-3n-4b-it-distill-smollm2-360m