---
library_name: transformers
tags:
- ocr
- trocr
- vision-encoder-decoder
- kazakh
- printed
- document-ai
license: apache-2.0
language:
- kk
metrics:
- cer
- exact_match
base_model: kazars24/trocr-base-handwritten-ru
pipeline_tag: image-to-text
datasets:
- thekamilya/kazakh-printed-dataset
- issai/kazparc
model-index:
- name: Kazakh Printed TrOCR
  results:
  - task:
      type: image-to-text
    dataset:
      name: Kazakh Printed Dataset
      type: custom
    metrics:
    - type: cer
      value: 3.70
      name: Character Error Rate
    - type: exact_match
      value: 48.62
      name: Exact Match (%)
---

# Kazakh Printed TrOCR

This model is a fine-tuned version of `kazars24/trocr-base-handwritten-ru` specifically optimized for recognizing **Kazakh printed text**. It leverages the TrOCR (Transformer-based Optical Character Recognition) architecture, utilizing a Vision Transformer (ViT) encoder and a RoBERTa-based decoder.


## Model Description
The model was adapted to handle Kazakh-specific Cyrillic characters (**ә, ғ, қ, ң, ө, ұ, ү, һ, і**) by resizing token embeddings and training on synthetic data.

- **Developed by:** Kamilya Nazarkhanova
- **Model type:** VisionEncoderDecoder (TrOCR)
- **Language(s):** Kazakh (kk)
- **Finetuned from:** kazars24/trocr-base-handwritten-ru

## Training Data & Lineage

The model was trained on [thekamilya/kazakh-printed-dataset](https://huggingface.co/datasets/thekamilya/kazakh-printed-dataset), which was synthetically generated using text from the [ISSAI KazPARC](https://huggingface.co/datasets/issai/kazparc) corpus.

### Data Generation Pipeline:
To overcome the scarcity of labeled Kazakh OCR data, I developed a robust synthetic generation engine:
- **Environmental Simulation:** Implemented random "Light" and "Dark" mode background logic.
- **Stylistic Diversity:** Randomized font selection, sizes, and text "jitter" to improve spatial invariance.
- **Optical Degradation:** Applied stochastic Gaussian noise, random rotations, and varying JPEG compression artifacts (30–95%) to simulate real-world document quality.

## How to Get Started

```python

import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

processor = TrOCRProcessor.from_pretrained("thekamilya/kazakh-trocr-fine-tuned")
model = VisionEncoderDecoderModel.from_pretrained("thekamilya/kazakh-trocr-fine-tuned")


# Move model to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Load image
image = Image.open("zheke.jpg").convert("RGB")

# Prepare input and move to GPU
pixel_values = processor(
    images=image,
    return_tensors="pt"
).pixel_values.to(device)

# Inference on GPU
with torch.no_grad():
    generated_ids = model.generate(pixel_values)

# Decode text
generated_text = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True
)[0]

print(f"Recognized Text: {generated_text}")