Safetensors
Vietnamese
English
medical-vision
medgemma
vision-language-model
lora
unsloth
spider-dataset
Instructions to use ImNotTam/medgemma-spider-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps Settings
- Unsloth Studio
How to use ImNotTam/medgemma-spider-finetuned with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ImNotTam/medgemma-spider-finetuned to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ImNotTam/medgemma-spider-finetuned to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ImNotTam/medgemma-spider-finetuned to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="ImNotTam/medgemma-spider-finetuned", max_seq_length=2048, )
| language: | |
| - vi | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - medical-vision | |
| - medgemma | |
| - vision-language-model | |
| - lora | |
| - unsloth | |
| - spider-dataset | |
| # medgemma-spider-finetuned | |
| Fine-tuned MedGemma model for Spider dataset - Medical image analysis with multiple images per patient. | |
| ## 📋 Model Information | |
| - **Base Model**: `google/medgemma-4b-it` | |
| - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) | |
| - **Dataset**: Spider dataset (series format - 1 patient = multiple images) | |
| - **Training Framework**: Unsloth (2x faster training) | |
| ## 📂 Folder Structure | |
| ``` | |
| output_medgemma_spider/ | |
| ├── final_model/ # Full merged model (large) | |
| ├── lora_adapters/ # LoRA adapters only (recommended, lightweight) | |
| ├── checkpoint-*/ # Training checkpoints | |
| ├── trainer_state.json # Training state | |
| └── eval_metrics.json # Evaluation metrics | |
| ``` | |
| ## 🚀 Usage | |
| ### 1️⃣ Load LoRA Adapters (Recommended - Lightweight) | |
| ```python | |
| from unsloth import FastVisionModel | |
| model, processor = FastVisionModel.from_pretrained( | |
| model_name="ImNotTam/medgemma-spider-finetuned", | |
| subfolder="lora_adapters", | |
| load_in_4bit=True, | |
| ) | |
| # Enable inference mode | |
| FastVisionModel.for_inference(model) | |
| # Prepare input with multiple images | |
| image_paths = ["path/to/image1.png", "path/to/image2.png", ...] | |
| question = "What do you see in these images?" | |
| messages = [ | |
| { | |
| "role": "user", | |
| "content": [ | |
| {"type": "image", "image": img_path} for img_path in image_paths | |
| ] + [{"type": "text", "text": question}] | |
| } | |
| ] | |
| # Generate response | |
| inputs = processor.apply_chat_template( | |
| messages, | |
| add_generation_prompt=True, | |
| tokenize=True, | |
| return_tensors="pt", | |
| ).to("cuda") | |
| outputs = model.generate(**inputs, max_new_tokens=512) | |
| response = processor.decode(outputs[0], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ### 2️⃣ Load Final Model (Full Model) | |
| ```python | |
| from transformers import AutoModelForVision2Seq, AutoProcessor | |
| model = AutoModelForVision2Seq.from_pretrained( | |
| "ImNotTam/medgemma-spider-finetuned", | |
| subfolder="final_model", | |
| device_map="auto", | |
| torch_dtype="auto" | |
| ) | |
| processor = AutoProcessor.from_pretrained( | |
| "ImNotTam/medgemma-spider-finetuned", | |
| subfolder="final_model" | |
| ) | |
| # Use same inference code as above | |
| ``` | |
| ### 3️⃣ Continue Training from LoRA Adapters | |
| ```python | |
| from unsloth import FastVisionModel | |
| from trl import SFTTrainer | |
| # Load LoRA adapter | |
| model, processor = FastVisionModel.from_pretrained( | |
| model_name="ImNotTam/medgemma-spider-finetuned", | |
| subfolder="lora_adapters", | |
| load_in_4bit=True, | |
| ) | |
| # Add new LoRA config để train tiếp | |
| model = FastVisionModel.get_peft_model( | |
| model, | |
| r=24, | |
| lora_alpha=48, | |
| lora_dropout=0.1, | |
| finetune_vision_layers=True, | |
| finetune_language_layers=True, | |
| ) | |
| # Train với data mới | |
| trainer = SFTTrainer( | |
| model=model, | |
| tokenizer=processor, | |
| train_dataset=your_new_dataset, | |
| # ... training args | |
| ) | |
| trainer.train() | |
| ``` | |
| ## 📊 Training Details | |
| - **LoRA Rank**: 24 | |
| - **LoRA Alpha**: 48 | |
| - **LoRA Dropout**: 0.1 | |
| - **Batch Size**: 2 (per device) | |
| - **Gradient Accumulation**: 12 steps | |
| - **Effective Batch Size**: 24 | |
| - **Learning Rate**: 2e-4 | |
| - **Max Sequence Length**: 1280 | |
| - **Max Images per Sample**: 18 | |
| - **Epochs**: 7 | |
| ## 💡 Recommendations | |
| - **For Inference**: Use `lora_adapters/` (lightweight, fast) | |
| - **For Production**: Use `final_model/` (full model) | |
| - **For Continued Training**: Load `lora_adapters/` + add new LoRA config | |
| ## 📦 Requirements | |
| ```bash | |
| pip install unsloth transformers torch trl pillow | |
| ``` | |
| ## 📄 License | |
| Apache 2.0 | |