- dataset: id: allenai/olmOCR-bench task_id: overall value: 83.2 notes: "H&F rewards omission, not transcription thus a model that outputs nothing scores perfectly. Excluded to keep Overall focused on real OCR quality." source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: arxiv_math value: 89.6 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: old_scans_math value: 85.6 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: table_tests value: 89.0 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: old_scans value: 42.2 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: multi_column value: 84.8 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: long_tiny_text value: 91.4 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: nielsr - dataset: id: allenai/olmOCR-bench task_id: headers_footers value: 19.7 notes: "Instead of removing headers and footers, our model is trained for full-page transcription and explicitly rewards their presence (via flipped RLVR tests), which lowers this score under the original benchmark scoring." source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: staghado - dataset: id: allenai/olmOCR-bench task_id: baseline value: 99.6 source: url: https://huggingface.co/papers/2601.14251 name: LightOnOCR technical report user: staghado