You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

MiniCPM-V-2.6-8B – Stage 3 (Multi-turn)

1. Model Overview

This model is part of a Vision-Language AI system designed for chest X-ray analysis in Vietnamese clinical settings.

The full pipeline consists of 3 stages:

  • Stage 1: Findings generation (image → radiology findings)
  • Stage 2: Impression generation (image → clinical impression)
  • Stage 3: Multi-turn conversation (findings + impression + dialogue)

This repository corresponds to:

  • Stage: 3 (Multi-turn)
  • Task: Multi-turn reasoning with findings and impression
  • Domain: Vietnamese medical imaging (Chest X-ray)

The model supports multi-turn dialogue, where:

  • Turn 1: Generate findings
  • Turn 2: Generate clinical impression based on previous context

2. Installation

pip install Pillow==10.1.0 torch==2.1.2 torchvision==0.16.2 transformers==4.40.0 sentencepiece==0.1.99 decord

3. Inference

GPU with bfloat16 is recommended.

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "THP2903/finetuning_medical_MiniCPM-V-2_6_8B_multiturns",
    trust_remote_code=True,
    attn_implementation="sdpa",
    torch_dtype=torch.bfloat16
)

model = model.eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    "THP2903/finetuning_medical_MiniCPM-V-2_6_8B_multiturns",
    trust_remote_code=True
)

image = Image.open("your_image.jpg").convert("RGB")

# Turn 1: Findings
question1 = "Ảnh chụp xray bệnh nhân nam, 48 tuổi PA. Mô tả thông tin benh nhân."

msgs = [
    {
        "role": "user",
        "content": [image, question1]
    }
]

res1 = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)

print("Turn 1:", res1)

# Turn 2: Impression (append history manually)
msgs.append(
    {
        "role": "assistant",
        "content": res1
    }
)

question2 = "Kết luận bệnh gì?"

msgs.append(
    {
        "role": "user",
        "content": question2
    }
)

res2 = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)

print("Turn 2:", res2)

4. Notes

  • Input must be a chest X-ray image
  • Turn 1 generates findings
  • Turn 2 generates clinical impression using previous conversation context
  • Conversation history is maintained via msgs list
  • This model follows the original MiniCPM-V multi-turn inference pipeline
  • For best performance, consider using Qwen2-VL-7B
Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support