Image-Text-to-Text
Transformers
Safetensors
mistral3
text-generation
ocr
document-understanding
vision-language
pdf
tables
forms
conversational
Eval Results
πͺπΊ Region: EU
Instructions to use lightonai/LightOnOCR-2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lightonai/LightOnOCR-2-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="lightonai/LightOnOCR-2-1B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("lightonai/LightOnOCR-2-1B") model = AutoModelForSeq2SeqLM.from_pretrained("lightonai/LightOnOCR-2-1B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lightonai/LightOnOCR-2-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lightonai/LightOnOCR-2-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/lightonai/LightOnOCR-2-1B
- SGLang
How to use lightonai/LightOnOCR-2-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lightonai/LightOnOCR-2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lightonai/LightOnOCR-2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightonai/LightOnOCR-2-1B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use lightonai/LightOnOCR-2-1B with Docker Model Runner:
docker model run hf.co/lightonai/LightOnOCR-2-1B
File size: 7,829 Bytes
b6b0cc8 6df51f0 b6b0cc8 2b336bb b6b0cc8 c70b084 b6b0cc8 371e6cb b6b0cc8 4f0ee23 b6b0cc8 4f0ee23 b6b0cc8 371e6cb b6b0cc8 c5e7f69 b6b0cc8 4f0ee23 b6b0cc8 17a629e b6b0cc8 2676f7d b6b0cc8 2838347 b6b0cc8 2838347 b6b0cc8 2b336bb b6b0cc8 2b336bb b6b0cc8 2b336bb b6b0cc8 2838347 b6b0cc8 2b336bb b6b0cc8 2676f7d 2b336bb b6b0cc8 9747c38 ce3dc3d b6b0cc8 9747c38 371e6cb b6b0cc8 d49910d 371e6cb d49910d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 | ---
license: apache-2.0
pipeline_tag: image-text-to-text
language:
- en
- fr
- de
- es
- it
- nl
- pt
- sv
- da
- zh
- ja
library_name: transformers
tags:
- ocr
- document-understanding
- vision-language
- pdf
- tables
- forms
---
<div align="center">
<img src="lightonocr-banner.png" alt="LightOnOCR-2-1B Banner" width="600"/>
</div>
---
<div align="center">
[](https://lighton.ai)
[](https://www.linkedin.com/company/lighton/)
[](https://x.com/LightOnIO)
π [Paper](https://arxiv.org/pdf/2601.14251) | π [Blog](https://huggingface.co/blog/lightonai/lightonocr-2) | π [Demo](https://huggingface.co/spaces/lightonai/LightOnOCR-2-1B-Demo) | π [Dataset](https://huggingface.co/datasets/lightonai/LightOnOCR-mix-0126) | π [Finetuning](https://colab.research.google.com/drive/1WjbsFJZ4vOAAlKtcCauFLn_evo5UBRNa?usp=sharing)
</div>
# LightOnOCR-2-1B
**Best OCR model .** LightOnOCR-2-1B is **[LightOn's](https://lighton.ai)** flagship OCR model, refined with RLVR training for maximum accuracy. We recommend this variant for most OCR tasks.
## About LightOnOCR-2
LightOnOCR-2 is an efficient end-to-end 1B-parameter vision-language model for converting documents (PDFs, scans, images) into clean, naturally ordered text without relying on brittle pipelines. This second version is trained on a larger and higher-quality corpus with stronger French, arXiv, and scan coverage, improved LaTeX handling, and cleaner normalization. LightOnOCR-2 achieves state-of-the-art performance on OlmOCR-Bench while being ~9Γ smaller and significantly faster than competing approaches.
## Highlights
* β‘ **Speed:** 3.3Γ faster than Chandra OCR, 1.7Γ faster than OlmOCR, 5Γ faster than dots.ocr, 2Γ faster than PaddleOCR-VL-0.9B, 1.73Γ faster than DeepSeekOCR
* πΈ **Efficiency:** Processes 5.71 pages/s on a single H100 (~493k pages/day) for **<$0.01 per 1,000 pages**
* π§ **End-to-End:** Fully differentiable, no external OCR pipeline
* π§Ύ **Versatile:** Handles tables, receipts, forms, multi-column layouts, and math notation
* π **Image detection:** Predicts bounding boxes for embedded images (bbox variants)
---
π **[Paper]( https://arxiv.org/pdf/2601.14251)** | π **[Blog Post](https://huggingface.co/blog/lightonai/lightonocr-2)** | π **[Demo](https://huggingface.co/spaces/lightonai/LightOnOCR-2-1B-Demo)** | π **[Dataset](https://huggingface.co/datasets/lightonai/LightOnOCR-mix-0126)** | π **[BBox Dataset](https://huggingface.co/datasets/lightonai/LightOnOCR-bbox-mix-0126)** | π **[Finetuning Notebook](https://colab.research.google.com/drive/1WjbsFJZ4vOAAlKtcCauFLn_evo5UBRNa?usp=sharing)** | **[LightOn blog entry](https://www.lighton.ai/lighton-blogs/lighton-opens-a-new-field-for-ai-with-lightonocr-2-document-intelligence)**
---
## Model Variants
| Variant | Description |
|---------|-------------|
| **[LightOnOCR-2-1B](https://huggingface.co/lightonai/LightOnOCR-2-1B)** | Best OCR model |
| **[LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base)** | Base model, ideal for fine-tuning |
| **[LightOnOCR-2-1B-bbox](https://huggingface.co/lightonai/LightOnOCR-2-1B-bbox)** | Best model with image bounding boxes |
| **[LightOnOCR-2-1B-bbox-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-bbox-base)** | Base bbox model, ideal for fine-tuning |
| **[LightOnOCR-2-1B-ocr-soup](https://huggingface.co/lightonai/LightOnOCR-2-1B-ocr-soup)** | Merged variant for extra robustness |
| **[LightOnOCR-2-1B-bbox-soup](https://huggingface.co/lightonai/LightOnOCR-2-1B-bbox-soup)** | Merged variant: OCR + bbox combined |
---
## Benchmarks
<div align="center">
<img src="benchmark.png" alt="OlmOCR-Bench Results" width="900"/>
</div>
*See the [paper](https://arxiv.org/pdf/2601.14251) for full benchmark details and methodology.*
---
## Usage with Transformers
> **Note:** LightOnOCR-2 is avaible in latest transformers release starting from v5.
```bash
uv pip install transformers # => 5.0.0
uv pip install pillow pypdfium2
```
```python
import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
device = "mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float32 if device == "mps" else torch.bfloat16
model = LightOnOcrForConditionalGeneration.from_pretrained("lightonai/LightOnOCR-2-1B", torch_dtype=dtype).to(device)
processor = LightOnOcrProcessor.from_pretrained("lightonai/LightOnOCR-2-1B")
url = "https://huggingface.co/datasets/hf-internal-testing/fixtures_ocr/resolve/main/SROIE-receipt.jpeg"
conversation = [{"role": "user", "content": [{"type": "image", "url": url}]}]
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
)
inputs = {k: v.to(device=device, dtype=dtype) if v.is_floating_point() else v.to(device) for k, v in inputs.items()}
output_ids = model.generate(**inputs, max_new_tokens=1024)
generated_ids = output_ids[0, inputs["input_ids"].shape[1]:]
output_text = processor.decode(generated_ids, skip_special_tokens=True)
print(output_text)
```
---
## Usage with vLLM
```bash
vllm serve lightonai/LightOnOCR-2-1B \
--limit-mm-per-prompt '{"image": 1}' --mm-processor-cache-gb 0 --no-enable-prefix-caching
```
```python
import base64
import requests
import pypdfium2 as pdfium
import io
ENDPOINT = "http://localhost:8000/v1/chat/completions"
MODEL = "lightonai/LightOnOCR-2-1B"
# Download PDF from arXiv
pdf_url = "https://arxiv.org/pdf/2412.13663"
pdf_data = requests.get(pdf_url).content
# Open PDF and convert first page to image
pdf = pdfium.PdfDocument(pdf_data)
page = pdf[0]
# Render at 200 DPI (scale factor = 200/72 β 2.77)
pil_image = page.render(scale=2.77).to_pil()
# Convert to base64
buffer = io.BytesIO()
pil_image.save(buffer, format="PNG")
image_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8')
# Make request
payload = {
"model": MODEL,
"messages": [{
"role": "user",
"content": [{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"}
}]
}],
"max_tokens": 4096,
"temperature": 0.2,
"top_p": 0.9,
}
response = requests.post(ENDPOINT, json=payload)
text = response.json()['choices'][0]['message']['content']
print(text)
```
---
## Rendering and Preprocessing Tips
* Render PDFs at 200 DPI to images using a target longest dimension of **1540px**
* Maintain aspect ratio to preserve text geometry
---
## Fine-tuning
LightOnOCR-2 is fully differentiable and supports:
* LoRA fine-tuning
* Domain adaptation (receipts, scientific articles, forms, etc.)
* Multilingual fine-tuning with task-specific corpora
For fine-tuning, we recommend starting with the **[LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base)** variant.
---
## License
Apache License 2.0
---
## Citation
```bibtex
@misc{lightonocr2_2026,
title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
year = {2026},
howpublished = {\url{https://arxiv.org/abs/2601.14251}}
}
```
[](https://huggingface.co/lightonai/LightOnOCR-2-1B)
[](https://huggingface.co/lightonai) |