You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

LLaVA-RAD Targeted LoRA (Layers 14-18) — Multi-task n=12K

LoRA adapter for microsoft/llava-rad, released as part of "Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models" (Sadanadan & Behzadan, CHIL 2026).

This is the cross-architecture replication of the targeted-layer LoRA arm on a different VLM family. Layers 14-18 of the LLaMA decoder are targeted, mapping to the same 44-56% relative depth as layers 15-19 in the 34-layer MedGemma. The replication tests whether the mechanistic-LoRA result generalises beyond the Gemma 2 architecture.

This release corresponds to the multi-task n=12K scale-up of the n=500/n=2000 binary checkpoints reported in the submitted CHIL paper.

Training

Setting	Value
Base model	`microsoft/llava-rad` (Vicuna-7B + BiomedCLIP-CXR)
Adapter rank (`r`)	16
`alpha`	32
Dropout	0.05
Learning rate	2e-4
Effective batch size	8 (batch 1, grad-accum 8)
Epochs	3
Target layers	14-18 of 32
Target modules	Q, K, V, O attention projections + gate, up, down MLP projections
Training data	MIMIC-CXR train split, all question types, ~2,865 unique questions × 3 epochs of random paraphrase sampling ≈ 8,600 paraphrase pairs
Loss	Sequence-level cross-entropy on first answer token + symmetric KL divergence between paraphrase predictions

Usage

Loading LLaVA-RAD requires the base-model components in addition to this adapter. See microsoft/llava-rad for base-model loading instructions.

# After loading the LLaVA-RAD base model with its mm_projector and original
# LoRA merged (see LLaVA-RAD model card), apply this adapter:
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, "saillab/llava-rad-targeted-lora-mimic-mt-12k")

Intended use

Research on cross-architecture paraphrase robustness in medical Vision-Language Models. Not for clinical use.

Citation (primary — CHIL 2026)

@inproceedings{sadanadan2026mechanistic,
  title     = {Mechanistically Guided LoRA Improves Paraphrase Consistency in Medical Vision-Language Models},
  author    = {Sadanadan, Binesh and Behzadan, Vahid},
  booktitle = {Conference on Health, Inference, and Learning (CHIL)},
  year      = {2026}
}

Companion evaluation work

@misc{sadanadan2026heatmap,
  title  = {Attention Without Grounding: Causal Evaluation of Visual Explanations in Medical Vision-Language Models},
  author = {Sadanadan, Binesh and Behzadan, Vahid},
  year   = {2026},
  note   = {Pre-print, SAIL Lab, University of New Haven}
}

License

Distributed under the LLaVA-RAD research license, inheriting the licensing terms of the base model.

Downloads last month: -

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for saillab/llava-rad-targeted-lora-mimic-mt-12k

Base model

microsoft/llava-rad

Adapter

(1)

this model

Collection including saillab/llava-rad-targeted-lora-mimic-mt-12k

Mechanistically Guided LoRA - PSF Remedy

Collection

Mechanistically guided fine-tuning, for PSF - MedGemma-4B (targeted + full) and LLaVA-RAD at n=12K multi-task • 3 items • Updated May 9