Instructions to use OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR")
model = AutoModelForMultimodalLM.from_pretrained("OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR

SGLang

How to use OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR with Docker Model Runner:
```
docker model run hf.co/OpenLLM-Korea/VARCO-VISION-2.0-1.7B-OCR
```

VARCO-VISION-2.0-1.7B-OCR

Introduction

VARCO-VISION-2.0-1.7B-OCR is a lightweight yet powerful OCR-specialized model derived from VARCO-VISION-2.0-1.7B, designed to deliver efficient and accurate text recognition in real-world scenarios. Unlike conventional vision-language models (VLMs) that primarily focus on transcribing visible text, this model performs both recognition and spatial localization by detecting bounding boxes around each character, enabling structured, layout-aware OCR outputs.

The model supports both Korean and English, making it well-suited for multilingual environments where mixed-script documents are common. Each recognized character is paired with its precise position in the image, formatted as <char>{characters}</char><bbox>{x1}, {y1}, {x2}, {y2}</bbox>, where the coordinates correspond to the top-left (x1, y1) and bottom-right (x2, y2) corners of the character's bounding box.

While VARCO-VISION-2.0-14B demonstrates strong OCR capabilities as part of its broader multimodal reasoning skills, deploying such a large model for single-task use cases can be computationally inefficient. VARCO-VISION-2.0-1.7B-OCR addresses this with a task-optimized design that retains high accuracy while significantly reducing resource requirements, making it ideal for real-time or resource-constrained applications.

🚨News🎙️

📰 2025-07-28: We released VARCO-VISION-2.0-1.7B-OCR at link
📰 2025-07-28: We released VARCO-VISION-2.0-1.7B at link
📰 2025-07-18: Updated the checkpoint of VARCO-VISION-2.0-14B for improved performance.
📰 2025-07-16: We released VARCO-VISION-2.0-14B at link
📰 2025-07-16: We released GME-VARCO-VISION-Embedding at link

VARCO-VISION-2.0 Family

Model Name	Base Models (Vision / Language)	HF Link
VARCO-VISION-2.0-14B	siglip2-so400m-patch16-384 / Qwen3-14B	link
VARCO-VISION-2.0-1.7B	siglip2-so400m-patch16-384 / Qwen3-1.7B	link
VARCO-VISION-2.0-1.7B-OCR	siglip2-so400m-patch16-384 / Qwen3-1.7B	link
GME-VARCO-VISION-Embedding	Qwen2-VL-7B-Instruct	link

Model Architecture

VARCO-VISION-2.0 follows the architecture of LLaVA-OneVision.

Evaluation

OCR Benchmark

Benchmark	CLOVA OCR	PaddleOCR	EasyOCR	VARCO-VISION-2.0-1.7B-OCR
CORD	93.9	91.4	77.8	95.6
ICDAR2013	94.4	92.0	85.0	95.5
ICDAR2015	84.1	73.7	57.9	75.4

Usage

To use this model, we recommend installing transformers version 4.53.1 or higher. Additionally, for best results, we recommend upscaling input images to a minimum resolution of 2,304 on the longer side if they are smaller.

import torch
from PIL import Image
from transformers import AutoProcessor, LlavaOnevisionForConditionalGeneration

model_name = "NCSOFT/VARCO-VISION-2.0-1.7B-OCR"
model = LlavaOnevisionForConditionalGeneration.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    attn_implementation="sdpa",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_name)

image = Image.open("file:///path/to/image.jpg")

# Image upscaling for OCR performance boost
w, h = image.size
target_size = 2304
if max(w, h) < target_size:
    scaling_factor = target_size / max(w, h)
    new_w = int(w * scaling_factor)
    new_h = int(h * scaling_factor)
    image = image.resize((new_w, new_h))

conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "<ocr>"},
        ],
    },
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device, torch.float16)

generate_ids = model.generate(**inputs, max_new_tokens=1024)
generate_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generate_ids)
]
output = processor.decode(generate_ids_trimmed[0], skip_special_tokens=False)
print(output)