MedPMC-CLIP

MedPMC-CLIP is a medical vision-language model based on the OpenCLIP ViT-L-14 architecture. The model was trained on the MedPMC-11M dataset, a carefully curated collection of approximately 11 million image-caption pairs derived from biomedical literature. Across a wide range of evaluations, MedPMC-CLIP consistently outperforms existing baseline models, including zero-shot medical image classification on 26 public benchmarks and zero-shot image retrieval on an internal clinical dermatology dataset. For additional details on model training and benchmark results, please refer to our paper (coming soon).

This repository provides the checkpoint in OpenCLIP format. Text inputs should be tokenized using the default OpenCLIP tokenizer for ViT-L-14.

tokenizer = open_clip.get_tokenizer("ViT-L-14")

Files

  • open_clip_pytorch_model.safetensors: OpenCLIP-format model checkpoint
  • inference_example.py: example code for image-text similarity
  • requirements.txt: minimal dependencies

Usage

import torch
import open_clip
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
from PIL import Image

model_name = "ViT-L-14"
device = "cuda" if torch.cuda.is_available() else "cpu"

model, _, preprocess = open_clip.create_model_and_transforms(
    model_name,
    pretrained=None,
)

repo_id = "Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1"

ckpt_path = hf_hub_download(
    repo_id=repo_id,
    filename="open_clip_pytorch_model.safetensors",
)

state_dict = load_file(ckpt_path, device="cpu")
model.load_state_dict(state_dict, strict=True)
model = model.to(device)
model.eval()

tokenizer = open_clip.get_tokenizer(model_name)

image = preprocess(Image.open("example.jpg").convert("RGB")).unsqueeze(0).to(device)
text = tokenizer(["fundus photograph", "chest radiograph", "histopathology image"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)

    image_features = image_features / image_features.norm(dim=-1, keepdim=True)
    text_features = text_features / text_features.norm(dim=-1, keepdim=True)

    similarity = image_features @ text_features.T

print(similarity)

Citation

Citation information will be added upon release.

Questions?

For questions or feedback, please contact Hyunjae Kim at hyunjae.kim@yale.edu.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1