Instructions to use sahilchachra/unlimited-ocr-mxfp4-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilchachra/unlimited-ocr-mxfp4-mlx with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("sahilchachra/unlimited-ocr-mxfp4-mlx") config = load_config("sahilchachra/unlimited-ocr-mxfp4-mlx") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Unlimited-OCR — MLX Block float MX FP4
MLX quantization of baidu/Unlimited-OCR, a 3B vision-language OCR model that pushes DeepSeek-OCR one step further (one-shot, long-horizon document parsing). This variant uses Block float MX FP4 quantization (5.66 effective bits/weight).
Quantized by: sahilchachra
Note on effective bpw: mlx-vlm's quantizers only act on the language tower's linear weights. The vision encoder and embeddings stay at bf16, so the on-disk size averages the quantized text decoder with the full-precision vision components.
About the model
- Architecture: DeepEncoder vision (SAM-ViT-B + CLIP-L/14, 1024×1024 input, 16× downsample) → linear projector → DeepSeek-V2 MoE text decoder (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token).
- Task: multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot long-horizon parsing). Supports gundam (crop) and base resolution modes.
- License: MIT (inherited from the base model).
Benchmark results
Evaluated on Apple M4 Pro (24 GB) with MLX on the FUNSD test set (50 scanned form images).
| Block float MX FP4 | FP16 baseline | |
|---|---|---|
| FUNSD CER ↓ | 2.3944 | 1.7588 |
| Decode tok/s | 251.9 | 146.2 |
| Peak memory | 3.61 GB | 7.62 GB |
| Disk size | 2260 MB | 6464 MB |
All variants compared
| Variant | CER ↓ | Tok/s | Memory | Disk |
|---|---|---|---|---|
| FP16 (baseline) | 1.7588 | 146.2 | 7.62 GB | 6464 MB |
| MXFP8 | 1.4556 | 205.6 | 4.98 GB | 3660 MB |
| Int8 | 1.5720 | 205.2 | 5.06 GB | 3747 MB |
| MXFP4 | 2.3944 | 251.9 | 3.61 GB | 2260 MB |
| Int4 | 2.2879 | 252.6 | 3.7 GB | 2347 MB |
Usage
pip install mlx-vlm
from mlx_vlm import load, generate
model, processor = load("sahilchachra/unlimited-ocr-mxfp4-mlx")
# Single-image OCR (Gundam mode)
response = generate(model, processor,
prompt="<image>document parsing.",
image="path/to/document.jpg",
max_tokens=4096, verbose=True)
Prompting guide
Unlimited-OCR uses the DeepSeek-OCR prompt vocabulary. The prompt is just an instruction;
prefix it with <|grounding|> whenever you also want bounding boxes for what was read.
| Task | Prompt |
|---|---|
| Document → Markdown (layout-aware, with boxes) | ` |
| Plain text OCR (just the text, no layout) | <image>Free OCR. |
| OCR with bounding boxes | ` |
| Native parse | <image>document parsing. |
| Parse a figure / chart / diagram | <image>Parse the figure. |
| Describe the image (general VQA) | <image>Describe this image in detail. |
Note: Unlike the GGUF/llama.cpp workflow, mlx-vlm requires the literal
<image>token in the prompt and a separateimage=argument pointing to the file path.
Understanding the output (grounding tokens)
With <|grounding|>, the model interleaves the recognized text with detection boxes:
<|det|>title [37, 64, 464, 132]<|/det|>INVOICE #2026-0623
<|det|>text [37, 194, 350, 247]<|/det|>Bill To: Sahil Chachra
<|det|>text [37, 483, 329, 543]<|/det|>Total Due: $44.00
Each [x1, y1, x2, y2] is the bounding box (top-left → bottom-right) of that span. Drop the
<|det|>...<|/det|> tags if you only want the text, or parse them to overlay boxes / build a layout.
Tip — long documents: For multi-page scans, run page-by-page and concatenate.
Important — model_type mapping
The original baidu/Unlimited-OCR uses model_type: "unlimited-ocr" which is not directly
recognized by mlx-vlm. This quantized variant ships with the config already patched:
config.json→"model_type": "deepseekocr"(was"unlimited-ocr"),auto_mapremovedprocessor_config.json→"processor_class": "DeepseekOCRProcessor"(was"UnlimitedOCRHFProcessor")
No manual patching needed — just load() and go.
If you are converting the original model yourself, apply these two changes before running
mlx_vlm convert.
All variants in this collection
MLX (Apple Silicon — this collection)
| Model | Variant | Disk |
|---|---|---|
| sahilchachra/unlimited-ocr-4bit-mlx | Affine int4 | 2347 MB |
| sahilchachra/unlimited-ocr-8bit-mlx | Affine int8 | 3747 MB |
| sahilchachra/unlimited-ocr-mxfp4-mlx | Block float MX FP4 ← this model | 2260 MB |
| sahilchachra/unlimited-ocr-mxfp8-mlx | Block float MX FP8 | 3660 MB |
GGUF (llama.cpp — cross-platform)
| Model | Notes |
|---|---|
| sahilchachra/Unlimited-OCR-GGUF | K-quants & i-quants (BF16 → IQ2_M). Requires llama.cpp PR #17400. |
Credits
- Base model: baidu/Unlimited-OCR (MIT) — builds on deepseek-ai/DeepSeek-OCR.
- Quantized by sahilchachra.
- Downloads last month
- 291
4-bit
Model tree for sahilchachra/unlimited-ocr-mxfp4-mlx
Base model
baidu/Unlimited-OCR
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("sahilchachra/unlimited-ocr-mxfp4-mlx") config = load_config("sahilchachra/unlimited-ocr-mxfp4-mlx") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output)