Instructions to use vcl-iisc/DynEval-Evaluator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use vcl-iisc/DynEval-Evaluator with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="vcl-iisc/DynEval-Evaluator")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("vcl-iisc/DynEval-Evaluator", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use vcl-iisc/DynEval-Evaluator with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "vcl-iisc/DynEval-Evaluator" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vcl-iisc/DynEval-Evaluator", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/vcl-iisc/DynEval-Evaluator
- SGLang
How to use vcl-iisc/DynEval-Evaluator with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "vcl-iisc/DynEval-Evaluator" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vcl-iisc/DynEval-Evaluator", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "vcl-iisc/DynEval-Evaluator" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "vcl-iisc/DynEval-Evaluator", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use vcl-iisc/DynEval-Evaluator with Docker Model Runner:
docker model run hf.co/vcl-iisc/DynEval-Evaluator
DynEval Evaluator
This repository contains the DynEval evaluator models, fine-tuned from Qwen3-VL vision-language models for multimodal evaluation workflows.
Two evaluator checkpoints are included:
- DynEval Evaluator 2B: Qwen3-VL 2B fine-tuned checkpoint uploaded under
DynEval-2B - DynEval Evaluator 4B: Qwen3-VL 4B fine-tuned checkpoint uploaded under
DynEval-4B
Both checkpoints are saved in Hugging Face transformers format and can be loaded with Qwen3VLForConditionalGeneration.
Model Variants
| Variant | Architecture | Checkpoint | Precision | Training Epochs | Global Step | Last Logged Loss |
|---|---|---|---|---|---|---|
| DynEval Evaluator 2B | Qwen3VLForConditionalGeneration |
315 | bfloat16 |
3.0 | 315 | 0.5989 |
| DynEval Evaluator 4B | Qwen3VLForConditionalGeneration |
471 | bfloat16 |
3.0 | 471 | 0.5784 |
Model Details
DynEval Evaluator 2B
- Model type:
qwen3_vl - Tokenizer:
Qwen2Tokenizer - Tokenizer max length: 65,536
- Text context config: 262,144 max position embeddings
- Text hidden size: 2,048
- Text layers: 28
- Attention heads: 16
- KV heads: 8
- Vision encoder depth: 24
- Vision hidden size: 1,024
- Vision patch size: 16
DynEval Evaluator 4B
- Model type:
qwen3_vl - Tokenizer:
Qwen2Tokenizer - Tokenizer max length: 65,536
- Text context config: 262,144 max position embeddings
- Text hidden size: 2,560
- Text layers: 36
- Attention heads: 32
- KV heads: 8
- Vision encoder depth: 24
- Vision hidden size: 1,024
- Vision patch size: 16
Special Tokens
The evaluator checkpoints include the following task tokens:
<|T2IA|>
<|IQA|>
<|EVALUATION|>
Use the task token that matches your evaluation setting.
Intended Use
DynEval Evaluator is intended for research use in multimodal evaluation, especially for evaluating text-to-image and image-question answering style outputs.
Example use cases:
- text-to-image alignment evaluation
- image-question answering evaluation
- multimodal response scoring
- visual reasoning evaluation
- evaluator-based comparison of image generation model outputs
These models should be used with the same prompt format and task tokens used during fine-tuning.
Quick Start
Install recent versions of transformers, torch, and related image dependencies.
pip install torch transformers accelerate pillow
Load the 2B Evaluator
import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
repo_id = "vcl-iisc/DynEval-Evaluator"
model = Qwen3VLForConditionalGeneration.from_pretrained(
repo_id,
subfolder="DynEval-2B",
dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(
repo_id,
subfolder="DynEval-2B",
)
Load the 4B Evaluator
import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
repo_id = "vcl-iisc/DynEval-Evaluator"
model = Qwen3VLForConditionalGeneration.from_pretrained(
repo_id,
subfolder="DynEval-4B",
dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(
repo_id,
subfolder="DynEval-4B",
)
Text-Only Evaluation Example
import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
repo_id = "vcl-iisc/DynEval-Evaluator"
subfolder = "DynEval-2B"
model = Qwen3VLForConditionalGeneration.from_pretrained(
repo_id,
subfolder=subfolder,
dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(repo_id, subfolder=subfolder)
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "<|EVALUATION|>\nEvaluate the following response for the given prompt.",
}
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = processor(
text=[text],
return_tensors="pt",
).to(model.device)
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
)
output = processor.batch_decode(
generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)
Image + Text Evaluation Example
from PIL import Image
import torch
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
repo_id = "vcl-iisc/DynEval-Evaluator"
subfolder = "DynEval-2B"
model = Qwen3VLForConditionalGeneration.from_pretrained(
repo_id,
subfolder=subfolder,
dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(repo_id, subfolder=subfolder)
image = Image.open("example.jpg").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{
"type": "text",
"text": "<|IQA|>\nEvaluate or answer the question for this image.",
},
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt",
).to(model.device)
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
)
output = processor.batch_decode(
generated_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False,
)[0]
print(output)