--- library_name: transformers tags: - ocr - trocr - vision-encoder-decoder - kazakh - printed - document-ai license: apache-2.0 language: - kk metrics: - cer - exact_match base_model: kazars24/trocr-base-handwritten-ru pipeline_tag: image-to-text datasets: - thekamilya/kazakh-printed-dataset - issai/kazparc model-index: - name: Kazakh Printed TrOCR results: - task: type: image-to-text dataset: name: Kazakh Printed Dataset type: custom metrics: - type: cer value: 3.70 name: Character Error Rate - type: exact_match value: 48.62 name: Exact Match (%) --- # Kazakh Printed TrOCR This model is a fine-tuned version of `kazars24/trocr-base-handwritten-ru` specifically optimized for recognizing **Kazakh printed text**. It leverages the TrOCR (Transformer-based Optical Character Recognition) architecture, utilizing a Vision Transformer (ViT) encoder and a RoBERTa-based decoder. ## Model Description The model was adapted to handle Kazakh-specific Cyrillic characters (**ә, ғ, қ, ң, ө, ұ, ү, һ, і**) by resizing token embeddings and training on synthetic data. - **Developed by:** Kamilya Nazarkhanova - **Model type:** VisionEncoderDecoder (TrOCR) - **Language(s):** Kazakh (kk) - **Finetuned from:** kazars24/trocr-base-handwritten-ru ## Training Data & Lineage The model was trained on [thekamilya/kazakh-printed-dataset](https://huggingface.co/datasets/thekamilya/kazakh-printed-dataset), which was synthetically generated using text from the [ISSAI KazPARC](https://huggingface.co/datasets/issai/kazparc) corpus. ### Data Generation Pipeline: To overcome the scarcity of labeled Kazakh OCR data, I developed a robust synthetic generation engine: - **Environmental Simulation:** Implemented random "Light" and "Dark" mode background logic. - **Stylistic Diversity:** Randomized font selection, sizes, and text "jitter" to improve spatial invariance. - **Optical Degradation:** Applied stochastic Gaussian noise, random rotations, and varying JPEG compression artifacts (30–95%) to simulate real-world document quality. ## How to Get Started ```python import torch from PIL import Image from transformers import TrOCRProcessor, VisionEncoderDecoderModel processor = TrOCRProcessor.from_pretrained("thekamilya/kazakh-trocr-fine-tuned") model = VisionEncoderDecoderModel.from_pretrained("thekamilya/kazakh-trocr-fine-tuned") # Move model to GPU device = "cuda" if torch.cuda.is_available() else "cpu" model = model.to(device) # Load image image = Image.open("zheke.jpg").convert("RGB") # Prepare input and move to GPU pixel_values = processor( images=image, return_tensors="pt" ).pixel_values.to(device) # Inference on GPU with torch.no_grad(): generated_ids = model.generate(pixel_values) # Decode text generated_text = processor.batch_decode( generated_ids, skip_special_tokens=True )[0] print(f"Recognized Text: {generated_text}")