--- library_name: open_clip tags: - clip - openclip - medical - biomedical - vision-language - image-text-retrieval - medpmc --- # MedPMC-CLIP MedPMC-CLIP is a medical vision-language model based on the OpenCLIP `ViT-L-14` architecture. This repository provides the checkpoint in **OpenCLIP format**. Text inputs should be tokenized using the default OpenCLIP tokenizer for `ViT-L-14`. ```python tokenizer = open_clip.get_tokenizer("ViT-L-14") ``` ## Files - `open_clip_pytorch_model.safetensors`: OpenCLIP-format model checkpoint - `inference_example.py`: example code for image-text similarity - `export_meta.json`: export metadata - `requirements.txt`: minimal dependencies ## Usage ```python import torch import open_clip from safetensors.torch import load_file from PIL import Image model_name = "ViT-L-14" device = "cuda" if torch.cuda.is_available() else "cpu" model, _, preprocess = open_clip.create_model_and_transforms( model_name, pretrained=None, ) state_dict = load_file("open_clip_pytorch_model.safetensors") model.load_state_dict(state_dict, strict=True) model = model.to(device) model.eval() tokenizer = open_clip.get_tokenizer(model_name) image = preprocess(Image.open("example.jpg").convert("RGB")).unsqueeze(0).to(device) text = tokenizer(["fundus photograph", "chest radiograph", "histopathology image"]).to(device) with torch.no_grad(): image_features = model.encode_image(image) text_features = model.encode_text(text) image_features = image_features / image_features.norm(dim=-1, keepdim=True) text_features = text_features / text_features.norm(dim=-1, keepdim=True) similarity = image_features @ text_features.T print(similarity) ``` ## Citation Citation information will be added upon release.