Instructions to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2-vl-2b-instruct-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct")

Transformers

How to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct

SGLang

How to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio new

How to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct",
    max_seq_length=2048,
)

Docker Model Runner
How to use NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct with Docker Model Runner:
```
docker model run hf.co/NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct
```

Qari-OCR-0.1-VL-2B-Instruct Model

Model Overview

This model is a fine-tuned version of unsloth/Qwen2-VL-2B-Instruct on an Arabic OCR dataset. It is optimized to perform Arabic Optical Character Recognition (OCR) for full-page text.

It is described in detail in the paper QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation.

Model Details

Base Model: Qwen2 VL
Fine-tuning Dataset: Arabic OCR dataset
Objective: Extract full-page Arabic text with high accuracy
Languages: Arabic
Tasks: OCR (Optical Character Recognition)
Dataset size: 5000 records
Epochs: 1

Performance Evaluation

The model has been evaluated on standard OCR metrics, including Word Error Rate (WER), Character Error Rate (CER), and BLEU score.

Metrics

Model	WER ↓	CER ↓	BLEU ↑
Qari v0.1 Model	0.068	0.019	0.860
Qwen2 VL 2B	1.344	1.191	0.201
EasyOCR	0.908	0.617	0.152
Tesseract OCR	0.428	0.226	0.410

Key Results

WER: 0.068 (93.2% word accuracy)
CER: 0.019 (98.1% character accuracy)
BLEU: 0.860

Performance Comparison

The Fine-Tuned Model outperforms other solutions with:

95% reduction in WER compared to Base Model
98% reduction in CER compared to Base Model
328% improvement in BLEU score compared to Base Model
84% lower WER than Tesseract OCR
92% lower WER than EasyOCR

Performance Comparison Charts

WER & CER Comparison

BLEU Score Comparison

Limitations

While the Arabic OCR model demonstrates strong performance under specific conditions, it has several limitations:

Font Dependency: The model was trained using a limited set of fonts (Almarai-Regular, Amiri-Regular, Cairo-Regular, Tajawal-Regular, and NotoNaskhArabic-Regular). As a result, its accuracy may degrade when processing text in other fonts, particularly decorative or stylized typefaces.
Font Size Restriction: Training was conducted with a fixed font size of 16. Variations in font size, especially very small or large text, may reduce recognition accuracy.
Diacritics Exclusion: The model does not support Arabic diacritics (Tashkeel). Text that relies on diacritics for disambiguation may not be correctly recognized.
Lack of Handwriting Support: The model is not trained to recognize handwritten text, limiting its applicability to printed documents only.
Full-Page Processing: The model was trained on full-page text recognition, which may impact its performance on segmented text, cropped sections, or text within complex layouts such as tables and multi-column formats.

These limitations should be considered when deploying the model in real-world applications to ensure optimal performance.

How to Use

Try Qari - Google Colab

You can load this model using the transformers and qwen_vl_utils library:

!pip install transformers qwen_vl_utils accelerate>=0.26.0 PEFT -U
!pip install -U bitsandbytes

from PIL import Image
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
import torch
import os
from qwen_vl_utils import process_vision_info



model_name = "NAMAA-Space/Qari-OCR-0.1-VL-2B-Instruct"
model = Qwen2VLForConditionalGeneration.from_pretrained(
                model_name,
                torch_dtype="auto",
                device_map="auto"
            )
processor = AutoProcessor.from_pretrained(model_name)
max_tokens = 2000

prompt = "Below is the image of one page of a document, as well as some raw textual content that was previously extracted for it. Just return the plain text representation of this document as if you were reading it naturally. Do not hallucinate."
image.save("image.png")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": f"file://{src}"},
            {"type": "text", "text": prompt},
        ],
    }
]
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=max_tokens)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
os.remove(src)
print(output_text)

License

This model follows the licensing terms of the original Qwen2 VL model. Please review the terms before using it commercially.

Citation

If you use this model in your research, please cite:

@article{wasfy2025qari,
  title={QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation},
  author={Wasfy, Ahmed and Nacar, Omer and Elkhateb, Abdelakreem and Reda, Mahmoud and Elshehy, Omar and Ammar, Adel and Boulila, Wadii},
  journal={arXiv preprint arXiv:2506.02295},
  year={2025}
}