richardyoung commited on
Commit
ba8d352
·
verified ·
1 Parent(s): c78689d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: allenai/olmOCR-2-7B-1025
4
+ tags:
5
+ - mlx
6
+ - vision
7
+ - ocr
8
+ - quantized
9
+ - apple-silicon
10
+ ---
11
+
12
+ # olmOCR-2-7B-1025-MLX-8bit
13
+
14
+ This is an 8-bit quantized version of [allenai/olmOCR-2-7B-1025](https://huggingface.co/allenai/olmOCR-2-7B-1025) optimized for Apple Silicon using MLX.
15
+
16
+ ## Model Description
17
+
18
+ olmOCR-2 is a state-of-the-art OCR (Optical Character Recognition) vision-language model fine-tuned from Qwen2.5-VL-7B-Instruct. This 8-bit quantized version provides excellent quality with significantly reduced memory footprint.
19
+
20
+ **Base Model:** allenai/olmOCR-2-7B-1025
21
+ **Quantization:** 8-bit using MLX
22
+ **Model Size:** 8.4 GB (down from ~14 GB BF16)
23
+ **Size Reduction:** ~40%
24
+
25
+ ## Performance
26
+
27
+ olmOCR-2 achieves **82.4 points on olmOCR-Bench**, representing state-of-the-art performance for real-world OCR of English-language digitized print documents. The model has been additionally fine-tuned using GRPO RL training to boost performance on:
28
+ - Math equations
29
+ - Tables
30
+ - Complex layouts
31
+ - Handwriting
32
+
33
+ ## Usage
34
+
35
+ ### Requirements
36
+
37
+ ```bash
38
+ pip install mlx-vlm
39
+ ```
40
+
41
+ ### Basic Usage
42
+
43
+ ```python
44
+ from mlx_vlm import load, generate
45
+ from PIL import Image
46
+
47
+ # Load the model
48
+ model, processor = load("richardyoung/olmOCR-2-7B-1025-MLX-8bit")
49
+
50
+ # Load your image
51
+ image = Image.open("document.png")
52
+
53
+ # Extract text
54
+ prompt = "Extract all text from this image."
55
+ output = generate(model, processor, image, prompt, max_tokens=2048)
56
+ print(output)
57
+ ```
58
+
59
+ ### Command Line
60
+
61
+ ```bash
62
+ python -m mlx_vlm.generate \
63
+ --model richardyoung/olmOCR-2-7B-1025-MLX-8bit \
64
+ --image document.png \
65
+ --prompt "Extract all text from this image." \
66
+ --max-tokens 2048
67
+ ```
68
+
69
+ ## Quantization Details
70
+
71
+ - **Method:** MLX native quantization
72
+ - **Bits:** 8-bit
73
+ - **Group Size:** Default
74
+ - **Recommended for:** Users who prioritize quality and have sufficient RAM (10GB+)
75
+
76
+ ## Model Variants
77
+
78
+ | Variant | Size | Precision | Use Case |
79
+ |---------|------|-----------|----------|
80
+ | [8-bit](https://huggingface.co/richardyoung/olmOCR-2-7B-1025-MLX-8bit) | 8.4 GB | Highest | Best quality, more RAM |
81
+ | [6-bit](https://huggingface.co/richardyoung/olmOCR-2-7B-1025-MLX-6bit) | 6.4 GB | High | Balanced quality/size |
82
+ | [4-bit](https://huggingface.co/richardyoung/olmOCR-2-7B-1025-MLX-4bit) | 4.5 GB | Good | Smallest size, less RAM |
83
+
84
+ ## System Requirements
85
+
86
+ - **Platform:** Apple Silicon (M1/M2/M3/M4)
87
+ - **RAM:** 10+ GB recommended
88
+ - **OS:** macOS 12.0+
89
+
90
+ ## Limitations
91
+
92
+ - Optimized primarily for English-language printed documents
93
+ - May have reduced performance on handwritten text compared to printed text
94
+ - Requires Apple Silicon hardware for optimal performance
95
+
96
+ ## Citation
97
+
98
+ ```bibtex
99
+ @article{olmocr2,
100
+ title={olmOCR 2: Unit test rewards for document OCR},
101
+ author={Allen Institute for AI},
102
+ year={2025}
103
+ }
104
+ ```
105
+
106
+ ## License
107
+
108
+ Apache 2.0 (inherited from base model)
109
+
110
+ ## Acknowledgements
111
+
112
+ - Base model by [Allen Institute for AI](https://allenai.org/)
113
+ - Quantized for MLX by richardyoung
114
+ - Built with [MLX-VLM](https://github.com/Blaizzy/mlx-vlm)
115
+
116
+ ---
117
+
118
+ *Generated with Claude Code*