Gemma-Train

Sleeping

turtle170 commited on Jan 20

Commit

c4662a8

verified ·

1 Parent(s): 477248c

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ The goal is to move beyond standard Low-Rank Adaptation (LoRA) to observe how fu
 ## 🛠️ Hardware Requirements & Grant Justification
 NVIDIA L40S
-Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the 24GB VRAM capacity of the A10G. Using an A10G-large will allow for faster dataset tokenization and more efficient model sharding during the "Push to Hub" phase, reducing the total grant time used.
 ## 🧪 Methodology
 - **Training Type:** Full-Model SFT (Supervised Fine-Tuning)

 ## 🛠️ Hardware Requirements & Grant Justification
 NVIDIA L40S
+Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the **48GB VRAM capacity of the L40S**. This high memory ceiling is essential for maintaining stability during the SFT process and preventing OOM (Out of Memory) errors when calculating multimodal attention gradients at 4B scale. Using an L40S will allow for faster dataset tokenization and more efficient model sharding, significantly reducing the total grant time used.
 ## 🧪 Methodology
 - **Training Type:** Full-Model SFT (Supervised Fine-Tuning)