Unlimited-OCR — MLX Block float MX FP4

MLX quantization of baidu/Unlimited-OCR, a 3B vision-language OCR model that pushes DeepSeek-OCR one step further (one-shot, long-horizon document parsing). This variant uses Block float MX FP4 quantization (5.66 effective bits/weight).

Quantized by: sahilchachra

Note on effective bpw: mlx-vlm's quantizers only act on the language tower's linear weights. The vision encoder and embeddings stay at bf16, so the on-disk size averages the quantized text decoder with the full-precision vision components.

About the model

Architecture: DeepEncoder vision (SAM-ViT-B + CLIP-L/14, 1024×1024 input, 16× downsample) → linear projector → DeepSeek-V2 MoE text decoder (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token).
Task: multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot long-horizon parsing). Supports gundam (crop) and base resolution modes.
License: MIT (inherited from the base model).

Benchmark results

Evaluated on Apple M4 Pro (24 GB) with MLX on the FUNSD test set (50 scanned form images).

	Block float MX FP4	FP16 baseline
FUNSD CER ↓	2.3944	1.7588
Decode tok/s	251.9	146.2
Peak memory	3.61 GB	7.62 GB
Disk size	2260 MB	6464 MB

All variants compared

Variant	CER ↓	Tok/s	Memory	Disk
FP16 (baseline)	1.7588	146.2	7.62 GB	6464 MB
MXFP8	1.4556	205.6	4.98 GB	3660 MB
Int8	1.5720	205.2	5.06 GB	3747 MB
MXFP4	2.3944	251.9	3.61 GB	2260 MB
Int4	2.2879	252.6	3.7 GB	2347 MB

Usage

pip install mlx-vlm

from mlx_vlm import load, generate

model, processor = load("sahilchachra/unlimited-ocr-mxfp4-mlx")

# Single-image OCR (Gundam mode)
response = generate(model, processor,
                    prompt="<image>document parsing.",
                    image="path/to/document.jpg",
                    max_tokens=4096, verbose=True)

Prompting guide

Unlimited-OCR uses the DeepSeek-OCR prompt vocabulary. The prompt is just an instruction; prefix it with <|grounding|> whenever you also want bounding boxes for what was read.

Task	Prompt
Document → Markdown (layout-aware, with boxes)	`<
Plain text OCR (just the text, no layout)	`<image>Free OCR.`
OCR with bounding boxes	`<
Native parse	`<image>document parsing.`
Parse a figure / chart / diagram	`<image>Parse the figure.`
Describe the image (general VQA)	`<image>Describe this image in detail.`

Note: Unlike the GGUF/llama.cpp workflow, mlx-vlm requires the literal <image> token in the prompt and a separate image= argument pointing to the file path.

Understanding the output (grounding tokens)

With <|grounding|>, the model interleaves the recognized text with detection boxes:

<|det|>title [37, 64, 464, 132]<|/det|>INVOICE #2026-0623
<|det|>text  [37, 194, 350, 247]<|/det|>Bill To: Sahil Chachra
<|det|>text  [37, 483, 329, 543]<|/det|>Total Due: $44.00

Each [x1, y1, x2, y2] is the bounding box (top-left → bottom-right) of that span. Drop the <|det|>...<|/det|> tags if you only want the text, or parse them to overlay boxes / build a layout.

Tip — long documents: For multi-page scans, run page-by-page and concatenate.

Important — model_type mapping

The original baidu/Unlimited-OCR uses model_type: "unlimited-ocr" which is not directly recognized by mlx-vlm. This quantized variant ships with the config already patched:

config.json → "model_type": "deepseekocr" (was "unlimited-ocr"), auto_map removed
processor_config.json → "processor_class": "DeepseekOCRProcessor" (was "UnlimitedOCRHFProcessor")

No manual patching needed — just load() and go.

If you are converting the original model yourself, apply these two changes before running mlx_vlm convert.

All variants in this collection

MLX (Apple Silicon — this collection)

Model	Variant	Disk
sahilchachra/unlimited-ocr-4bit-mlx	Affine int4	2347 MB
sahilchachra/unlimited-ocr-8bit-mlx	Affine int8	3747 MB
sahilchachra/unlimited-ocr-mxfp4-mlx	Block float MX FP4 ← this model	2260 MB
sahilchachra/unlimited-ocr-mxfp8-mlx	Block float MX FP8	3660 MB

GGUF (llama.cpp — cross-platform)

Model	Notes
sahilchachra/Unlimited-OCR-GGUF	K-quants & i-quants (BF16 → IQ2_M). Requires llama.cpp PR #17400.

Credits

Base model: baidu/Unlimited-OCR (MIT) — builds on deepseek-ai/DeepSeek-OCR.
Quantized by sahilchachra.

Downloads last month: 291

Safetensors

Model size

1.0B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for sahilchachra/unlimited-ocr-mxfp4-mlx

Base model

baidu/Unlimited-OCR

Quantized

(10)

this model

Collection including sahilchachra/unlimited-ocr-mxfp4-mlx

Baidu's Unlimited OCR

Collection

7 items • Updated 4 days ago