Instructions to use Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- OpenCLIP
How to use Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1 with OpenCLIP:
import open_clip model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1') tokenizer = open_clip.get_tokenizer('hf-hub:Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1') - Notebooks
- Google Colab
- Kaggle
MedPMC-CLIP
MedPMC-CLIP is a medical vision-language model based on the OpenCLIP ViT-L-14 architecture.
The model was trained on the MedPMC-11M dataset, a carefully curated collection of approximately 11 million image-caption pairs derived from biomedical literature.
Across a wide range of evaluations, MedPMC-CLIP consistently outperforms existing baseline models, including zero-shot medical image classification on 26 public benchmarks and zero-shot image retrieval on an internal clinical dermatology dataset.
For additional details on model training and benchmark results, please refer to our paper (coming soon).
This repository provides the checkpoint in OpenCLIP format. Text inputs should be tokenized using the default OpenCLIP tokenizer for ViT-L-14.
tokenizer = open_clip.get_tokenizer("ViT-L-14")
Files
open_clip_pytorch_model.safetensors: OpenCLIP-format model checkpointinference_example.py: example code for image-text similarityrequirements.txt: minimal dependencies
Usage
import torch
import open_clip
from safetensors.torch import load_file
from huggingface_hub import hf_hub_download
from PIL import Image
model_name = "ViT-L-14"
device = "cuda" if torch.cuda.is_available() else "cpu"
model, _, preprocess = open_clip.create_model_and_transforms(
model_name,
pretrained=None,
)
repo_id = "Yale-BIDS-Chen/medpmc-clip-l-14_jun24_v1"
ckpt_path = hf_hub_download(
repo_id=repo_id,
filename="open_clip_pytorch_model.safetensors",
)
state_dict = load_file(ckpt_path, device="cpu")
model.load_state_dict(state_dict, strict=True)
model = model.to(device)
model.eval()
tokenizer = open_clip.get_tokenizer(model_name)
image = preprocess(Image.open("example.jpg").convert("RGB")).unsqueeze(0).to(device)
text = tokenizer(["fundus photograph", "chest radiograph", "histopathology image"]).to(device)
with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
similarity = image_features @ text_features.T
print(similarity)
Citation
Citation information will be added upon release.
Questions?
For questions or feedback, please contact Hyunjae Kim at hyunjae.kim@yale.edu.
- Downloads last month
- -