--- language: - grt license: cc-by-4.0 tags: - ocr - florence-2 - garo - northeast-india - image-to-text base_model: microsoft/Florence-2-base-ft metrics: - character_accuracy model-index: - name: MWirelabs/garo-ocr results: - task: type: image-to-text name: OCR metrics: - type: character_accuracy value: 93.13 name: Character Accuracy (1000 samples) --- # GaroOCR ![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg) ![Character Accuracy](https://img.shields.io/badge/Char%20Accuracy-93.13%25-brightgreen) OCR model for the Garo (grt_Latn) language, fine-tuned from `microsoft/Florence-2-base-ft` on Garo text images. Developed by **MWire Labs**, Shillong, Meghalaya; part of an ongoing effort to build foundational AI for Northeast Indian languages. --- ## Model Details | | | |---|---| | Base model | `microsoft/Florence-2-base-ft` | | Parameters | 231M | | Language | Garo (Achik) | | Task | OCR (image → text) | | Training samples | 80,000 | | Epochs | 5 | | Character Accuracy | 93.13% | --- ## Training Setup - **Hardware:** NVIDIA A40 (48GB) - **Precision:** bfloat16 - **Batch size:** 4 (effective 16 with gradient accumulation) - **Learning rate:** 3e-4 with cosine scheduler - **Max label length:** 128 tokens - **Task prompt:** `` (Florence-2 uppercase token) --- ## Usage ```python from transformers import AutoProcessor, AutoModelForCausalLM from PIL import Image import torch processor = AutoProcessor.from_pretrained("MWirelabs/garo-ocr", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "MWirelabs/garo-ocr", torch_dtype=torch.bfloat16, trust_remote_code=True, ).cuda() image = Image.open("your_image.png").convert("RGB") inputs = processor(text="", images=image, return_tensors="pt") inputs = {k: v.cuda() for k, v in inputs.items()} inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16) with torch.no_grad(): generated = model.generate( pixel_values=inputs["pixel_values"], input_ids=inputs["input_ids"], max_new_tokens=128, ) text = processor.tokenizer.decode(generated[0], skip_special_tokens=True) print(text) ``` > **Note:** Use `transformers==4.38.2` for compatibility. --- ## Limitations - Max reliable output length is ~128 tokens - Part of MWire Labs' mono-language series; a multilingual NE-OCR model covering more Northeast Indian languages is in development ---