--- library_name: peft license: other base_model: unsloth/SmolLM2-360M-Instruct tags: - base_model:adapter:unsloth/SmolLM2-360M-Instruct - llama-factory - lora - transformers pipeline_tag: text-generation model-index: - name: train_2026-05-08-20-46-08 results: [] datasets: - sapbot/gemma-3n-4b-it-423x language: - en - ru --- # Gemma 3n 4B Distill SmolLM2 360M Instruct **WARNING:** REMEMBER TO ADD CUSTOM SYSTEM PROMPT RESEMBLING GEMMA 3N 4B IF YOU WANT MODEL TO KNOW THAT IT'S IT, BECAUSE THE ONLY THING CHANGED IS STYLE, IN DATASET THERE WERE NO SIGNS OF TEACHER MODEL. HAVE FUN. This model is a fine-tuned version of [unsloth/SmolLM2-360M-Instruct](https://huggingface.co/unsloth/SmolLM2-360M-Instruct) on the [sapbot/gemma-3n-4b-it-423x](https://huggingface.co/datasets/sapbot/gemma-3n-4b-it-423x) dataset. ## Model description This model was distilled from Gemma 3n 4B, essentially as a demonstration of purpose of my datasets. ## Intended uses & limitations As a demonstration of my datasets Limitations are easy: it's still SmolLM2, just with... a little bit of google taste. ## Training and evaluation data ![Loss Graph](training_loss.png) ## Training procedure Used default parameters from LLaMA-Factory. P.S. Waiting for Unsloth Studio official ROCm support, LLaMA-Factory was pain to use as inexpirienced user in that field. ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 2 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 16 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - num_epochs: 3.0 - mixed_precision_training: Native AMP ### Device Trained entirely on one `AMD ATI Radeon RX 6600` just for 30min. It was NOT QLoRA, but just a LoRA! What to say, really fast! ### Framework versions - PEFT 0.18.1 - Transformers 5.2.0 - Pytorch 2.6.0+rocm6.1 - Datasets 4.0.0 - Tokenizers 0.22.2