YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
MiniCPM-V-2.6-8B – Stage 3 (Multi-turn)
1. Model Overview
This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings.
The full pipeline consists of 3 stages:
- Stage 1: Findings generation (image → radiology findings)
- Stage 2: Impression generation (image → clinical impression)
- Stage 3: Multi-turn conversation (findings + impression + dialogue)
This repository corresponds to:
- Stage: 3 (Multi-turn)
- Task: Multi-turn reasoning with findings and impression
- Domain: Vietnamese medical imaging (Chest X-ray)
The model supports multi-turn dialogue, where:
- Turn 1: Generate findings
- Turn 2: Generate clinical impression based on previous context
2. Installation
pip install Pillow==10.1.0 torch==2.1.2 torchvision==0.16.2 transformers==4.40.0 sentencepiece==0.1.99 decord
3. Inference
GPU with bfloat16 is recommended.
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"THP2903/finetuning_medical_MiniCPM-V-2_6_8B_multiturns",
trust_remote_code=True,
attn_implementation="sdpa",
torch_dtype=torch.bfloat16
)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
"THP2903/finetuning_medical_MiniCPM-V-2_6_8B_multiturns",
trust_remote_code=True
)
image = Image.open("your_image.jpg").convert("RGB")
# Turn 1: Findings
question1 = "Ảnh chụp xray bệnh nhân nam, 48 tuổi PA. Mô tả thông tin benh nhân."
msgs = [
{
"role": "user",
"content": [image, question1]
}
]
res1 = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print("Turn 1:", res1)
# Turn 2: Impression (append history manually)
msgs.append(
{
"role": "assistant",
"content": res1
}
)
question2 = "Kết luận bệnh gì?"
msgs.append(
{
"role": "user",
"content": question2
}
)
res2 = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print("Turn 2:", res2)
4. Notes
- Input must be a chest X-ray image
- Turn 1 generates findings
- Turn 2 generates clinical impression using previous conversation context
- Conversation history is maintained via msgs list
- This model follows the original MiniCPM-V multi-turn inference pipeline
- For best performance, consider using Qwen2-VL-7B
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support