Unlimited-OCR — MLX Block float MX FP4

MLX quantization of baidu/Unlimited-OCR, a 3B vision-language OCR model that pushes DeepSeek-OCR one step further (one-shot, long-horizon document parsing). This variant uses Block float MX FP4 quantization (5.66 effective bits/weight).

Quantized by: sahilchachra

Note on effective bpw: mlx-vlm's quantizers only act on the language tower's linear weights. The vision encoder and embeddings stay at bf16, so the on-disk size averages the quantized text decoder with the full-precision vision components.

About the model

  • Architecture: DeepEncoder vision (SAM-ViT-B + CLIP-L/14, 1024×1024 input, 16× downsample) → linear projector → DeepSeek-V2 MoE text decoder (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token).
  • Task: multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot long-horizon parsing). Supports gundam (crop) and base resolution modes.
  • License: MIT (inherited from the base model).

Benchmark results

Evaluated on Apple M4 Pro (24 GB) with MLX on the FUNSD test set (50 scanned form images).

Block float MX FP4 FP16 baseline
FUNSD CER ↓ 2.3944 1.7588
Decode tok/s 251.9 146.2
Peak memory 3.61 GB 7.62 GB
Disk size 2260 MB 6464 MB

All variants compared

Variant CER ↓ Tok/s Memory Disk
FP16 (baseline) 1.7588 146.2 7.62 GB 6464 MB
MXFP8 1.4556 205.6 4.98 GB 3660 MB
Int8 1.5720 205.2 5.06 GB 3747 MB
MXFP4 2.3944 251.9 3.61 GB 2260 MB
Int4 2.2879 252.6 3.7 GB 2347 MB

Usage

pip install mlx-vlm
from mlx_vlm import load, generate

model, processor = load("sahilchachra/unlimited-ocr-mxfp4-mlx")

# Single-image OCR (Gundam mode)
response = generate(model, processor,
                    prompt="<image>document parsing.",
                    image="path/to/document.jpg",
                    max_tokens=4096, verbose=True)

Prompting guide

Unlimited-OCR uses the DeepSeek-OCR prompt vocabulary. The prompt is just an instruction; prefix it with <|grounding|> whenever you also want bounding boxes for what was read.

Task Prompt
Document → Markdown (layout-aware, with boxes) `<
Plain text OCR (just the text, no layout) <image>Free OCR.
OCR with bounding boxes `<
Native parse <image>document parsing.
Parse a figure / chart / diagram <image>Parse the figure.
Describe the image (general VQA) <image>Describe this image in detail.

Note: Unlike the GGUF/llama.cpp workflow, mlx-vlm requires the literal <image> token in the prompt and a separate image= argument pointing to the file path.

Understanding the output (grounding tokens)

With <|grounding|>, the model interleaves the recognized text with detection boxes:

<|det|>title [37, 64, 464, 132]<|/det|>INVOICE #2026-0623
<|det|>text  [37, 194, 350, 247]<|/det|>Bill To: Sahil Chachra
<|det|>text  [37, 483, 329, 543]<|/det|>Total Due: $44.00

Each [x1, y1, x2, y2] is the bounding box (top-left → bottom-right) of that span. Drop the <|det|>...<|/det|> tags if you only want the text, or parse them to overlay boxes / build a layout.

Tip — long documents: For multi-page scans, run page-by-page and concatenate.

Important — model_type mapping

The original baidu/Unlimited-OCR uses model_type: "unlimited-ocr" which is not directly recognized by mlx-vlm. This quantized variant ships with the config already patched:

  • config.json"model_type": "deepseekocr" (was "unlimited-ocr"), auto_map removed
  • processor_config.json"processor_class": "DeepseekOCRProcessor" (was "UnlimitedOCRHFProcessor")

No manual patching needed — just load() and go.

If you are converting the original model yourself, apply these two changes before running mlx_vlm convert.

All variants in this collection

MLX (Apple Silicon — this collection)

Model Variant Disk
sahilchachra/unlimited-ocr-4bit-mlx Affine int4 2347 MB
sahilchachra/unlimited-ocr-8bit-mlx Affine int8 3747 MB
sahilchachra/unlimited-ocr-mxfp4-mlx Block float MX FP4 ← this model 2260 MB
sahilchachra/unlimited-ocr-mxfp8-mlx Block float MX FP8 3660 MB

GGUF (llama.cpp — cross-platform)

Model Notes
sahilchachra/Unlimited-OCR-GGUF K-quants & i-quants (BF16 → IQ2_M). Requires llama.cpp PR #17400.

Credits

Downloads last month
291
Safetensors
Model size
1.0B params
Tensor type
BF16
·
U8
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/unlimited-ocr-mxfp4-mlx

Quantized
(10)
this model

Collection including sahilchachra/unlimited-ocr-mxfp4-mlx