--- base_model: google/gemma-4-31B-it library_name: peft model_name: gemma4-31b-it-glimmer-rp-r16a32 tags: - base_model:adapter:google/gemma-4-31B-it - lora - sft - transformers - trl licence: license pipeline_tag: text-generation --- # gemma4-31b-it-glimmer-rp-r16a32 This model is a fine-tuned version of [google/gemma-4-31B-it](https://huggingface.co/google/gemma-4-31B-it). **W&B run:** [https://wandb.ai/cooawoo-personal/Gemma4-31B/runs/nbmb3v4h](https://wandb.ai/cooawoo-personal/Gemma4-31B/runs/nbmb3v4h) ## Training procedure ### Hyperparameters | Parameter | Value | |-----------|-------| | Learning rate | `1e-05` | | LR scheduler | rex (custom; max_lr=1e-5, min_lr=1e-6, warmup_ratio=0.05)_WITH_WARMUP | | Per-device batch size | 1 | | Gradient accumulation | 4 | | Effective batch size | 4 | | Epochs | 1 | | Max sequence length | 6144 | | Optimizer | OptimizerNames.PAGED_ADAMW_8BIT | | Warmup ratio | 0.05 | | Max gradient norm | 1.0 | | Precision | bf16 | | Loss type | nll | | Assistant-only loss | yes | | Chunked cross-entropy | yes | ### LoRA configuration | Parameter | Value | |-----------|-------| | Rank (r) | 16 | | Alpha | 32 | | Target modules | .*language_model\.layers\.\d+\.(self_attn\.(q|k|v|o)_proj|mlp\.(gate|up|down)_proj)$ | | Quantization | 4-bit (nf4) | ### Dataset statistics | Dataset | Samples | Total tokens | Trainable tokens | |---------|--------:|-------------:|-----------------:| | writing_critique.jsonl | 1,586 | 1,317,233 | 599,216 | | instruct.jsonl | 962 | 933,867 | 838,340 | | marvin_style_bible.jsonl | 2,549 | 11,096,548 | 10,492,459 | | rp_generation_mistral.jsonl | 255 | 825,572 | 375,698 | | rp_analysis.jsonl | 244 | 725,115 | 177,861 | | rp_generation_final.jsonl | 129 | 513,974 | 240,019 | | **Total** | **5,725** | **15,412,309** | **12,723,593** |
Training config ```yaml model_name_or_path: google/gemma-4-31B-it data_config: data.yaml prepared_dataset: prepared_packed output_dir: gemma4-31b-it-glimmer-rp-r16a32 chat_template_path: chat_template_with_channel.jinja attn_implementation: flex_attention bf16: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false use_cce: true chunked_mlp: true chunked_mlp_chunks: 16 dataloader_num_workers: 2 dataloader_pin_memory: true model_parallel: true max_memory: 0: 16GiB 1: 24GiB max_length: 6144 per_device_train_batch_size: 1 gradient_accumulation_steps: 4 pad_to_multiple_of: 128 use_peft: true load_in_4bit: true bnb_4bit_quant_type: nf4 lora_r: 16 lora_alpha: 32 lora_dropout: 0.0 use_rslora: false lora_target_modules: .*language_model\.layers\.\d+\.(self_attn\.(q|k|v|o)_proj|mlp\.(gate|up|down)_proj)$ learning_rate: 1.0e-05 lr_scheduler_type: constant_with_warmup warmup_ratio: 0.05 weight_decay: 0.0 max_grad_norm: 1.0 optim: paged_adamw_8bit num_train_epochs: 1 saves_per_epoch: 2 save_total_limit: 4 rolling_save_steps: 30 rolling_save_total_limit: 1 assistant_only_loss: true full_mask_reasoning: true logging_steps: 1 disable_tqdm: false report_to: wandb run_name: g4-31b-it-glimmer-rp ```
Data config ```yaml datasets: - path: rp_generation_final.jsonl type: chat truncation_strategy: split - path: rp_generation_mistral.jsonl type: chat truncation_strategy: split - path: instruct.jsonl type: chat truncation_strategy: split - path: rp_analysis.jsonl type: chat truncation_strategy: split - path: writing_critique.jsonl type: chat truncation_strategy: split - path: marvin_style_bible.jsonl type: chat truncation_strategy: split shuffle_datasets: true shuffle_combined: true shuffle_seed: 42 eval_split: 0 split_seed: 42 assistant_only_loss: true ```
### Framework versions - PEFT 0.18.1 - Loft: 0.1.0 - Transformers: 5.5.4 - Pytorch: 2.6.0+cu124 - Datasets: 4.6.1 - Tokenizers: 0.22.2