turtle170 commited on
Commit
c4662a8
·
verified ·
1 Parent(s): 477248c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -31,7 +31,7 @@ The goal is to move beyond standard Low-Rank Adaptation (LoRA) to observe how fu
31
  ## 🛠️ Hardware Requirements & Grant Justification
32
  NVIDIA L40S
33
 
34
- Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the 24GB VRAM capacity of the A10G. Using an A10G-large will allow for faster dataset tokenization and more efficient model sharding during the "Push to Hub" phase, reducing the total grant time used.
35
 
36
  ## 🧪 Methodology
37
  - **Training Type:** Full-Model SFT (Supervised Fine-Tuning)
 
31
  ## 🛠️ Hardware Requirements & Grant Justification
32
  NVIDIA L40S
33
 
34
+ Because Gemma 3 is a multimodal model, the vision-language alignment layers and the full-parameter gradient states require the **48GB VRAM capacity of the L40S**. This high memory ceiling is essential for maintaining stability during the SFT process and preventing OOM (Out of Memory) errors when calculating multimodal attention gradients at 4B scale. Using an L40S will allow for faster dataset tokenization and more efficient model sharding, significantly reducing the total grant time used.
35
 
36
  ## 🧪 Methodology
37
  - **Training Type:** Full-Model SFT (Supervised Fine-Tuning)