Instructions to use saillab/llava-rad-targeted-lora-mimic-mt-12k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use saillab/llava-rad-targeted-lora-mimic-mt-12k with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("lmsys/vicuna-7b-v1.5") model = PeftModel.from_pretrained(base_model, "saillab/llava-rad-targeted-lora-mimic-mt-12k") - Notebooks
- Google Colab
- Kaggle
LLaVA-RAD Targeted LoRA (Layers 14-18) β Multi-task n=12K
LoRA adapter for microsoft/llava-rad,
released as part of "Mechanistically Guided LoRA Improves Paraphrase
Consistency in Medical Vision-Language Models" (Sadanadan & Behzadan,
CHIL 2026).
This is the cross-architecture replication of the targeted-layer LoRA arm on a different VLM family. Layers 14-18 of the LLaMA decoder are targeted, mapping to the same 44-56% relative depth as layers 15-19 in the 34-layer MedGemma. The replication tests whether the mechanistic-LoRA result generalises beyond the Gemma 2 architecture.
This release corresponds to the multi-task n=12K scale-up of the n=500/n=2000 binary checkpoints reported in the submitted CHIL paper.
Training
| Setting | Value |
|---|---|
| Base model | microsoft/llava-rad (Vicuna-7B + BiomedCLIP-CXR) |
Adapter rank (r) |
16 |
alpha |
32 |
| Dropout | 0.05 |
| Learning rate | 2e-4 |
| Effective batch size | 8 (batch 1, grad-accum 8) |
| Epochs | 3 |
| Target layers | 14-18 of 32 |
| Target modules | Q, K, V, O attention projections + gate, up, down MLP projections |
| Training data | MIMIC-CXR train split, all question types, ~2,865 unique questions Γ 3 epochs of random paraphrase sampling β 8,600 paraphrase pairs |
| Loss | Sequence-level cross-entropy on first answer token + symmetric KL divergence between paraphrase predictions |
Usage
Loading LLaVA-RAD requires the base-model components in addition to this
adapter. See microsoft/llava-rad
for base-model loading instructions.
# After loading the LLaVA-RAD base model with its mm_projector and original
# LoRA merged (see LLaVA-RAD model card), apply this adapter:
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, "saillab/llava-rad-targeted-lora-mimic-mt-12k")
Intended use
Research on cross-architecture paraphrase robustness in medical Vision-Language Models. Not for clinical use.
Citation (primary β CHIL 2026)
@inproceedings{sadanadan2026mechanistic,
title = {Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models},
author = {Sadanadan, Binesh and Behzadan, Vahid},
booktitle = {Conference on Health, Inference, and Learning (CHIL)},
year = {2026}
}
Companion evaluation work
@misc{sadanadan2026heatmap,
title = {Attention Without Grounding: Causal Evaluation of Visual Explanations in Medical Vision-Language Models},
author = {Sadanadan, Binesh and Behzadan, Vahid},
year = {2026},
note = {Pre-print, SAIL Lab, University of New Haven}
}
License
Distributed under the LLaVA-RAD research license, inheriting the licensing terms of the base model.
- Downloads last month
- -
Model tree for saillab/llava-rad-targeted-lora-mimic-mt-12k
Base model
microsoft/llava-rad