Instructions to use jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/medgemma-1.5-4b-it")
model = PeftModel.from_pretrained(base_model, "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1")

Transformers

How to use jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1

SGLang

How to use jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1 with Docker Model Runner:
```
docker model run hf.co/jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1
```

MedGemma 1.5 RSNA Abdominal Trauma Adapter

This is a Parameter-Efficient Fine-Tuning (LoRA) adapter for google/medgemma-1.5-4b-it. It specializes the base medical vision-language model to act as an expert trauma radiologist, analyzing abdominal CT angiogram volumes to detect and classify solid organ injuries and hemorrhages, outputting highly structured JSON clinical reports.

Model Details

Model Description

The base MedGemma 1.5 (4B parameters) model has been fine-tuned using LoRA on the RSNA 2023 Abdominal Trauma Detection dataset. Instead of open-ended conversational text, this adapter strictly aligns the model to evaluate multi-slice CT volumes and generate a structured JSON output detailing the injury pattern, specific organs involved (liver, spleen, kidney, bowel), bleeding description, severity estimation, and differential diagnoses.

Developed by: Jayant Som
Funded by [optional]: N/A
Shared by [optional]: N/A
Model type: Multimodal Vision-Language Model (VLM) Adapter
Language(s) (NLP): English
License: MIT
Finetuned from model: google/medgemma-1.5-4b-it

Model Sources [optional]

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

This adapter is intended to be loaded on top of the base medgemma-1.5-4b-it model. It is designed to take an interleaved sequence of 2.5D CT slice images (NIfTI/DICOM converted to RGB via soft-tissue windowing) and output a precise JSON schema. It is a core component of the HAI-DEF multi-model trauma analysis pipeline.

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

This model is a research prototype and is not intended for direct clinical decision making or unsupervised patient diagnosis.
The model is specialized for abdominal trauma CT scans and will likely perform poorly on MRIs, X-rays, or CTs of other anatomical regions (e.g., cranial or thoracic).

Bias, Risks, and Limitations

Dataset Limitation: The model was fine-tuned on a highly curated subset of 200 cases from the RSNA 2023 challenge. It may inherit biases present in that specific sample distribution (e.g., underrepresentation of rare bowel injuries).
Hallucinations: Like all LLMs/VLMs, the model can confidently hallucinate clinical findings. Outputs must always be reviewed by a qualified radiologist.
Image Windowing: The model expects CT images formatted with soft-tissue windowing (Center: 50 HU, Width: 400 HU). Providing incorrectly windowed images will degrade performance.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText, BitsAndBytesConfig
from peft import PeftModel

# 1. Load base model with 4-bit quantization
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
base_model = AutoModelForImageTextToText.from_pretrained(
    "google/medgemma-1.5-4b-it", 
    quantization_config=bnb_config, 
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("google/medgemma-1.5-4b-it")

# 2. Load this LoRA adapter
model = PeftModel.from_pretrained(base_model, "jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1")

# 3. Prepare Image and Prompt
image = Image.open("path_to_ct_slice.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "You are a trauma radiologist. Analyze this abdominal CT angiogram slice for hemorrhage and solid organ injury. Respond in JSON with keys: injury_pattern, organs_involved, bleeding_description, severity_estimate, differential_diagnosis."}
    ]
}]

inputs = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt").to("cuda")

4. Generate JSON Output

outputs = model.generate(**inputs, max_new_tokens=800, do_sample=False) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Training Details

Training Data

The model was fine-tuned on a curated 200-sample subset of the RSNA 2023 Abdominal Trauma Detection dataset (jherng/rsna-2023-abdominal-trauma-detection). Multi-slice 3D NIfTI volumes were lazily loaded into memory as streams to optimize processing.

Training Procedure

Preprocessing

CT NIfTI volumes were loaded and dynamically sliced using a 2.5D multi-slice approach. Slices were extracted from the middle 60% of the volume, and soft-tissue windowing (Center: 50 HU, Width: 400 HU) was applied to map the data into 3-channel RGB PIL Images.

Training Hyperparameters

Training regime: bfloat16 mixed precision via BitsAndBytes NF4 4-bit quantization.
Attention Mechanism: sdpa (Scaled Dot Product Attention)
Epochs: 3
Learning Rate: 2e-4 (Cosine scheduler, 50 warmup steps)
Batch Size: 1 per device (Gradient Accumulation: 8) -> Effective Batch Size: 8
Optimizer: paged_adamw_8bit
LoRA Parameters: Rank = 16, Alpha = 32
LoRA Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters: 32,788,480 (0.7567% of base model)
Regularization: NEFTune Noise Alpha = 5.0, Gradient Checkpointing enabled

Speeds, Sizes, Times

Hardware: NVIDIA A100-SXM4-40GB
Training Runtime: 784.1 seconds (~13 minutes)
Throughput: 0.765 samples/second
Final Training Loss: 1.618

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluation was performed on a hold-out validation split of 20 samples from the RSNA 2023 Abdominal Trauma dataset that the model did not see during the training phase.

Factors

Evaluation disaggregates performance based on:

Organ Type: Liver, Spleen, Kidney, and Bowel.
Injury Severity: Low-grade vs. High-grade lacerations.
Artifacts: Presence of medical devices or imaging artifacts (e.g., motion blur).

Metrics

Format Adherence: Percentage of outputs that successfully parsed as valid JSON.
Clinical Recall: Accuracy of correctly identifying specific damaged solid organs.
Severity Match: Exact match rate for the severity_estimate field compared to radiologist ground-truth labels.

Results

Format Adherence: 100% (The model perfectly learned the JSON schema constraint).
Organ Identification Accuracy: ~88% accuracy in identifying liver/spleen trauma.
Severity Match: ~82% alignment with clinical ground truth on mild vs. severe differentiation.

Summary

The LoRA fine-tuning successfully shifted MedGemma from a conversational assistant into a strict clinical parser. It demonstrates high capability in recognizing massive hemorrhage and solid organ damage, though it occasionally struggles with subtle, low-grade bowel injuries due to the limited 200-sample dataset size.

Model Examination

Early qualitative examination indicates that the model heavily relies on the interleaved 2.5D visual tokens to identify active extravasation (bright contrast pooling). It demonstrates strong cross-modal alignment between the visual presence of hemoperitoneum and the generated bleeding_description text.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 1x NVIDIA A100-SXM4-40GB
Hours used: ~0.25 hours
Cloud Provider: Google Colab
Compute Region: US-East (estimated default)
Carbon Emitted: Minimal (due to extreme efficiency of LoRA and A100 acceleration). Estimated at < 0.05 kg CO2 eq.

Technical Specifications

Model Architecture and Objective

The underlying architecture is MedGemma 1.5 (a PaliGemma-style Multimodal Vision-Language Model). The objective is Autoregressive Causal Language Modeling conditioned on multimodal image-text tokens, specifically parameterized via Low-Rank Adaptation (LoRA) matrices applied to the attention and MLP layers.

Compute Infrastructure

Hardware

GPU: NVIDIA A100-SXM4-40GB
System Memory: 83.5 GB (Colab High-RAM instance)

Software

PEFT: 0.19.1
Transformers: 4.47+
TRL: 0.12+
Datasets: <3.0.0

Citation

BibTeX:

@misc{rsna2023trauma,
  author = {RSNA},
  title = {RSNA 2023 Abdominal Trauma Detection},
  year = {2023},
  publisher = {Kaggle},
  url = {https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection}
}

APA:

Radiological Society of North America (RSNA). (2023). RSNA 2023 abdominal trauma detection. Kaggle. https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection

Glossary

LoRA: Parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer.
NIfTI: Neuroimaging Informatics Technology Initiative. A popular file format for storing 3D medical imaging data (e.g., CT volumes).
HU (Hounsfield Units): A quantitative scale for describing radiodensity in medical CT. Soft tissue windowing (center: 50, width: 400) is used here.
SDPA: Scaled Dot Product Attention. A highly optimized, memory-efficient PyTorch attention implementation similar to Flash Attention.

More Information

This adapter is released for educational and research purposes. I encourage the community to:

Experiment with and improve upon this model
Share your results, insights, and adaptations
Collaborate on advancing HAI-DEF clinical screening and related medical imaging tasks

I am open to collaboration, questions, and feedback. Feel free to reach out via Hugging Face or email below.

Model Card Authors

Jayant Som

Model Card Contact

Jayant Som
(Reach out via Hugging Face profile or jayant2025ms@gmail.com)

Downloads last month: 3

Model tree for jayantsom/medgemma-1v5-4b-it-rsna23-abd-ct-peft-lora-r16-a32-ep3-lr2e4-v1

Base model

google/medgemma-1.5-4b-it

Adapter

(55)

this model