TrOCR-base Printed Text โ€” GGUF

Text recognition model for CrispEmbed. Larger and more capable than trocr-small-printed. Recognizes printed text from cropped text-line images. Uses GPT-2 BPE tokenizer (50,265 tokens).

Architecture: BEiT encoder (12L, 768d, 12 heads) + TrOCR decoder (12L, 1024d, 16 heads). 333M parameters. Tied embeddings (lm_head = embed_tokens).

Source: microsoft/trocr-base-printed (MIT).

Model Variants

Variant Size Recognition quality
F32 1.3 GB exact token match vs HuggingFace (greedy)
F16 639 MB identical to F32
Q8_0 340 MB identical to F32

Q4_K not tested โ€” d_model=1024 should handle it better than small (256d), but Q8_0 is recommended for this model size.

Usage

Pair with cstr/dbnet-ic15-GGUF for end-to-end OCR.

crispembed --det dbnet-ic15-q4_k.gguf -m trocr-base-printed-q8_0.gguf --ocr document.png

License

MIT (same as microsoft/trocr-base-printed).

Downloads last month
19
GGUF
Model size
0.4B params
Architecture
trocr
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/trocr-base-printed-GGUF

Quantized
(3)
this model