Instructions to use thekamilya/kazakh-trocr-fine-tuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use thekamilya/kazakh-trocr-fine-tuned with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="thekamilya/kazakh-trocr-fine-tuned")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("thekamilya/kazakh-trocr-fine-tuned") model = AutoModelForMultimodalLM.from_pretrained("thekamilya/kazakh-trocr-fine-tuned") - Notebooks
- Google Colab
- Kaggle
Kazakh Printed TrOCR
This model is a fine-tuned version of kazars24/trocr-base-handwritten-ru specifically optimized for recognizing Kazakh printed text. It leverages the TrOCR (Transformer-based Optical Character Recognition) architecture, utilizing a Vision Transformer (ViT) encoder and a RoBERTa-based decoder.
Model Description
The model was adapted to handle Kazakh-specific Cyrillic characters (Ó™, Ò“, Ò›, Ò£, Ó©, Ò±, Ò¯, Ò», Ñ–) by resizing token embeddings and training on synthetic data.
- Developed by: Kamilya Nazarkhanova
- Model type: VisionEncoderDecoder (TrOCR)
- Language(s): Kazakh (kk)
- Finetuned from: kazars24/trocr-base-handwritten-ru
Training Data & Lineage
The model was trained on thekamilya/kazakh-printed-dataset, which was synthetically generated using text from the ISSAI KazPARC corpus.
Data Generation Pipeline:
To overcome the scarcity of labeled Kazakh OCR data, I developed a robust synthetic generation engine:
- Environmental Simulation: Implemented random "Light" and "Dark" mode background logic.
- Stylistic Diversity: Randomized font selection, sizes, and text "jitter" to improve spatial invariance.
- Optical Degradation: Applied stochastic Gaussian noise, random rotations, and varying JPEG compression artifacts (30–95%) to simulate real-world document quality.
How to Get Started
import torch
from PIL import Image
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
processor = TrOCRProcessor.from_pretrained("thekamilya/kazakh-trocr-fine-tuned")
model = VisionEncoderDecoderModel.from_pretrained("thekamilya/kazakh-trocr-fine-tuned")
# Move model to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Load image
image = Image.open("zheke.jpg").convert("RGB")
# Prepare input and move to GPU
pixel_values = processor(
images=image,
return_tensors="pt"
).pixel_values.to(device)
# Inference on GPU
with torch.no_grad():
generated_ids = model.generate(pixel_values)
# Decode text
generated_text = processor.batch_decode(
generated_ids,
skip_special_tokens=True
)[0]
print(f"Recognized Text: {generated_text}")
- Downloads last month
- 7
Model tree for thekamilya/kazakh-trocr-fine-tuned
Base model
microsoft/trocr-base-handwrittenDatasets used to train thekamilya/kazakh-trocr-fine-tuned
thekamilya/kazakh-printed-dataset
Evaluation results
- Character Error Rate on Kazakh Printed Datasetself-reported3.700
- Exact Match (%) on Kazakh Printed Datasetself-reported48.620