Instructions to use sahilchachra/Unlimited-OCR-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use sahilchachra/Unlimited-OCR-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sahilchachra/Unlimited-OCR-GGUF", filename="Unlimited-OCR-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use sahilchachra/Unlimited-OCR-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
Use Docker
docker model run hf.co/sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use sahilchachra/Unlimited-OCR-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sahilchachra/Unlimited-OCR-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sahilchachra/Unlimited-OCR-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
- Ollama
How to use sahilchachra/Unlimited-OCR-GGUF with Ollama:
ollama run hf.co/sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
- Unsloth Studio
How to use sahilchachra/Unlimited-OCR-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sahilchachra/Unlimited-OCR-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sahilchachra/Unlimited-OCR-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sahilchachra/Unlimited-OCR-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use sahilchachra/Unlimited-OCR-GGUF with Docker Model Runner:
docker model run hf.co/sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
- Lemonade
How to use sahilchachra/Unlimited-OCR-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sahilchachra/Unlimited-OCR-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Unlimited-OCR-GGUF-Q4_K_M
List all available models
lemonade list
Unlimited-OCR — GGUF
GGUF quantizations of baidu/Unlimited-OCR, a 3B vision-language OCR model that pushes DeepSeek-OCR one step further (one-shot, long-horizon document parsing). This repo contains a full spread of K-quants and i-quants of the language model plus the vision projector (mmproj) needed for image input.
⚠️ Requires a DeepSeek-OCR–aware llama.cpp build (PR #17400). Unlimited-OCR uses the DeepSeek-OCR architecture (a SAM+CLIP DeepEncoder vision tower with a DeepSeek-V2 MoE text decoder). Support is not yet merged into upstream
main— stock llama.cpp will not load these files. Build the PR branch (instructions below).
Files
Every run needs two files: one language model GGUF (pick a quant) plus the shared vision projector. The projector is fp16 and identical for all quants.
| File | Quant | Bits | Size | Notes |
|---|---|---|---|---|
Unlimited-OCR-BF16.gguf |
BF16 | 16 | 5.47 GiB | Full-precision conversion. The base every quant is made from; reference quality. |
Unlimited-OCR-Q8_0.gguf |
Q8_0 | 8 | 2.91 GiB | Near-lossless. Best quality short of BF16; recommended if you have the disk/RAM. |
Unlimited-OCR-Q6_K.gguf |
Q6_K | 6 | 2.43 GiB | Very high quality, essentially indistinguishable from Q8_0 for OCR. |
Unlimited-OCR-Q5_K_M.gguf |
Q5_K_M | 5 | 2.07 GiB | High quality. Great balance when you can spare a bit more than Q4. |
Unlimited-OCR-Q5_K_S.gguf |
Q5_K_S | 5 | 1.95 GiB | High quality, slightly smaller than Q5_K_M. |
Unlimited-OCR-Q4_K_M.gguf |
Q4_K_M | 4 | 1.82 GiB | Recommended default — best overall size/quality trade-off. |
Unlimited-OCR-Q4_K_S.gguf |
Q4_K_S | 4 | 1.68 GiB | Slightly smaller than Q4_K_M with a small quality cost. |
Unlimited-OCR-Q3_K_M.gguf |
Q3_K_M | 3 | 1.45 GiB | Compact. Usable when memory is tight; some quality loss. |
Unlimited-OCR-IQ4_XS.gguf |
IQ4_XS | 4 | 1.53 GiB | i-quant: smaller than Q4_K_S at similar quality (built with imatrix). |
Unlimited-OCR-IQ4_NL.gguf |
IQ4_NL | 4 | 1.59 GiB | i-quant (non-linear): 4-bit tuned for ARM/edge; good on Jetson/Apple. |
Unlimited-OCR-IQ3_M.gguf |
IQ3_M | 3 | 1.35 GiB | i-quant: solid 3-bit quality for the size (imatrix). |
Unlimited-OCR-IQ3_XXS.gguf |
IQ3_XXS | 3 | 1.24 GiB | i-quant: very small 3-bit; noticeable quality loss but runnable. |
Unlimited-OCR-IQ2_M.gguf |
IQ2_M | 2 | 1.15 GiB | i-quant: smallest here; experimental, lowest quality — for tight memory only. |
Vision projector (required for all of the above):
| File | Type | Size |
|---|---|---|
mmproj-Unlimited-OCR-F16.gguf |
F16 | 774.27 MiB |
Sizes are the on-disk GGUF sizes. The vision encoder is kept at F16 (not quantized) — it is small and quantizing it hurts OCR accuracy. i-quants were built with an importance matrix (imatrix) computed from a general-text calibration set.
Build llama.cpp with DeepSeek-OCR support
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
git fetch origin pull/17400/head:pr17400 && git checkout pr17400
cmake -B build -DCMAKE_BUILD_TYPE=Release # add -DGGML_CUDA=ON for NVIDIA
cmake --build build -j --target llama-mtmd-cli llama-server
Quick start
Download one quant + the projector (you always need both):
huggingface-cli download sahilchachra/Unlimited-OCR-GGUF \
--include "Unlimited-OCR-Q4_K_M.gguf" "mmproj-Unlimited-OCR-F16.gguf" --local-dir ./uocr
Run it on an image:
./build/bin/llama-mtmd-cli \
-m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
--image document.png \
-p "<|grounding|>Convert the document to markdown." \
--chat-template deepseek-ocr --temp 0
--chat-template deepseek-ocrand--mmprojare required. With--image, the image is injected automatically — you do not need to type a literal<image>token in-p. Use--temp 0for OCR (deterministic). Add-n 4096(or more) for long/dense documents.
Prompting guide
Unlimited-OCR uses the DeepSeek-OCR prompt vocabulary. The prompt is just an instruction;
prefix it with <|grounding|> whenever you also want bounding boxes for what was read.
| Task | Prompt (-p) |
|---|---|
| Document → Markdown (layout-aware, with boxes) | `< |
| Plain text OCR (just the text, no layout) | Free OCR. |
| OCR with bounding boxes | `< |
| Native Unlimited-OCR parse | document parsing. |
| Parse a figure / chart / diagram | Parse the figure. |
| Describe the image (general VQA) | Describe this image in detail. |
| Find specific text (referring grounding) | `< |
Worked examples
1) Document → clean Markdown (tables, headings, reading order):
./build/bin/llama-mtmd-cli -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf --chat-template deepseek-ocr \
--image invoice.png --temp 0 -n 4096 \
-p "<|grounding|>Convert the document to markdown."
2) Just the raw text, no layout / no boxes:
./build/bin/llama-mtmd-cli -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf --chat-template deepseek-ocr \
--image receipt.jpg --temp 0 -p "Free OCR."
3) Locate a specific string and get its box:
./build/bin/llama-mtmd-cli -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf --chat-template deepseek-ocr \
--image form.png --temp 0 \
-p "<|grounding|>Locate <|ref|>Invoice Number<|/ref|> in the image."
Understanding the output (grounding tokens)
With <|grounding|>, the model interleaves the recognized text with detection boxes:
<|det|>title [37, 64, 464, 132]<|/det|>INVOICE #2026-0623
<|det|>text [37, 194, 350, 247]<|/det|>Bill To: Sahil Chachra
<|det|>text [37, 483, 329, 543]<|/det|>Total Due: $44.00
Each [x1, y1, x2, y2] is the bounding box (top-left → bottom-right) of that span, in the
coordinate space of the model's input image. Drop the <|det|>...<|/det|> tags if you only
want the text, or parse them to overlay boxes / build a layout. Without <|grounding|> you get
plain text (or Markdown) with no box tags.
Tip — long documents: Unlimited-OCR targets one-shot long-horizon parsing. For multi-page scans, run page-by-page and concatenate. If output ever repeats/loops on a dense page, add a mild repetition penalty, e.g.
--repeat-penalty 1.05, and keep--temp 0.
Serving (OpenAI-compatible API)
./build/bin/llama-server \
-m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
--mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
--chat-template deepseek-ocr -c 8192 --host 0.0.0.0 --port 8080
Call it with an image (base64 data URL):
IMG=$(base64 -w0 document.png)
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"temperature": 0,
"messages": [{ "role": "user", "content": [
{ "type": "text", "text": "<|grounding|>Convert the document to markdown." },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,'"$IMG"'" } }
]}]
}'
Python (OpenAI SDK) is identical — point base_url at http://localhost:8080/v1, send a
text part with the prompt above and an image_url part with the data URL.
About the model
- Architecture:
DeepseekOCRForCausalLM— DeepEncoder vision (SAM-ViT-B + CLIP-L/14, 1024×1024 input, 16× downsample) → linear projector → DeepSeek-V2 MoE text decoder (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token). - Task: multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot long-horizon parsing). The original supports gundam (crop) and base resolution modes.
- License: MIT (inherited from the base model).
How these were made
- Converted
baidu/Unlimited-OCRto GGUF with the PR #17400convert_hf_to_gguf.py. The converter targets DeepSeek-OCR, so the config's top-levelarchitectureswas set toDeepseekOCRForCausalLMandlanguage_config.architecturestoDeepseekV2ForCausalLM(the model is otherwise byte-identical to DeepSeek-OCR's tensor layout). - Exported the text decoder (BF16) and the vision tower (
--mmproj, F16) separately. - Built an importance matrix from a general-text corpus and produced the K-/i-quants with
llama-quantize. - Verified: the BF16 GGUF + mmproj correctly OCR a test document (text + grounding boxes)
via
llama-mtmd-clibefore quantizing.
Limitations
- Needs the PR #17400 llama.cpp build until DeepSeek-OCR support lands in
main. - Very low-bit i-quants (IQ3_XXS, IQ2_M) trade real accuracy for size — prefer Q4_K_M or higher for production OCR.
- The vision encoder runs in fp16 regardless of the chosen text quant.
Credits
- Base model: baidu/Unlimited-OCR (MIT) — builds on deepseek-ai/DeepSeek-OCR.
- GGUF / DeepSeek-OCR llama.cpp support: ggml-org/llama.cpp#17400.
- Quantized by sahilchachra.
- Downloads last month
- 7,356
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for sahilchachra/Unlimited-OCR-GGUF
Base model
baidu/Unlimited-OCR