---
base_model:
- tachiwin/Tachiwin-OCR-1.5
tags:
- text-generation-inference
- transformers
- unsloth
- paddleocr_vl
- trl
- sft
license: apache-2.0
datasets:
- tachiwin/multilingual_ocr_llm_2
metrics:
- cer
library_name: adapter-transformers
model-index:
- name: Tachiwin-OCR-1.5
  results:
  - task:
      type: image-to-text
      name: Optical Character Recognition (OCR)
    dataset:
      name: Tachiwin Multilingual OCR LLM
      type: tachiwin-multilingual-ocr-llm
    metrics:
    - name: Character Error Rate (CER)
      type: cer
      value: 2.03
    - name: Word Error Rate (WER)
      type: wer
      value: 3.6
    - name: OCR Accuracy (1 - CER)
      type: accuracy
      value: 97.97
    - name: Word Accuracy (1 - WER)
      type: word-accuracy
      value: 96.4
pipeline_tag: image-text-to-text
---

# TachiwinOCR 1.5 GGUF 🦡
**for the Indigenous Languages of Mexico**

This is a PaddleOCR-VL Finetune specialized in the 68 indigenous languages of Mexico and their diverse character and glyph repertoire making a world first in tech access and linguistic rights

## Inference
You can perform inference using the `PaddleOCR` pipeline or the `transformers` library.

#### Option A: Using PaddleOCR
```python
from paddleocr import PaddleOCRVL

# Load the fine-tuned model
pipeline = PaddleOCRVL(
    vl_rec_model_name="tachiwin/Tachiwin-OCR-1.5",
    vl_rec_model_dir=path_to_tachiwin_downloaded_model,
)

# Predict on an image
output = pipeline.predict("test.png")

for res in output:
    res.print()
    res.save_to_json(save_path="output")
    res.save_to_markdown(save_path="output")
```

#### Option B: Using Transformers
```python
from PIL import Image
import torch
from transformers import AutoModelForCausalLM, AutoProcessor

MODEL = "tachiwin/Tachiwin-OCR-1.5"
image_path = "my_image.png"

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

image = Image.open(image_path).convert("RGB")

model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
).to(DEVICE).eval()
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": "OCR:"},
    ]}
]

inputs = processor.apply_chat_template(
    messages, 
    tokenize=True, 
    add_generation_prompt=True, 	
    return_dict=True,
    return_tensors="pt"
).to(DEVICE)

outputs = model.generate(**inputs, max_new_tokens=1024, min_new_tokens=1)
generated_text = processor.batch_decode(outputs, skip_special_tokens=True)[0]

print(generated_text)
```

---

## 📊 Benchmark Results

Tachiwin-OCR 1.5 was evaluated against the base PaddleOCR-VL 1.5 model using a diverse subset of Indigenous language samples. The fine-tuning results demonstrate **dramatic improvements** in both character and word recognition accuracy — far surpassing the gains seen in version 1.0.


### Summary Metrics

| Metric | Base Model (Raw) | Tachiwin-OCR 1.5 (Fine-tuned) | Improvement |
| :--- | :---: | :---: | :---: |
| **Character Error Rate (CER)** | 17.65% | 2.03% | **88.5% (Relative Reduction)** |
| **Word Error Rate (WER)** | 38.59% | 3.60% | **90.7% (Relative Reduction)** |
| **OCR Accuracy (1 − CER)** | 82.35% | 97.97% | **+15.61pp (Absolute)** |
| **Word Accuracy (1 − WER)** | 61.41% | 96.40% | **+34.99pp (Absolute)** |

### Version Comparison: 1.0 → 1.5

| Metric | Tachiwin-OCR v1.0 | Tachiwin-OCR v1.5 | Δ Change |
| :--- | :---: | :---: | :---: |
| **CER** | 6.80% | 2.03% | **−4.77pp** |
| **WER** | 17.36% | 3.60% | **−13.76pp** |
| **Accuracy (1 − CER)** | 93.20% | 97.97% | **+4.77pp** |
| **Word Accuracy (1 − WER)** | 82.64% | 96.40% | **+13.76pp** |
| **Relative CER Reduction** | 10.4% | 88.5% | **+78.1pp** |
| **Relative WER Reduction** | 31.0% | 90.7% | **+59.7pp** |


### Detailed Comparison — v1.5 Sample Results

Results across 21 language samples. Languages with tonal or complex diacritic systems show the most dramatic improvements:

| # | Language Code | Raw CER | FT CER | Raw WER | FT WER | CER Improvement |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| 0 | `zpo` (Zapotec) | 0.24% | 0.00% | 1.12% | 0.00% | +0.24% |
| 1 | `maz` (Central Mazahua) | 0.41% | 0.00% | 2.27% | 0.00% | +0.41% |
| 2 | `zao` (Zapotec) | 6.18% | 3.49% | 23.61% | 12.50% | +2.69% |
| 3 | `mat` (Matlatzinca) | 6.51% | 0.00% | 42.55% | 0.00% | +6.51% |
| 4 | `amu` (Amuzgo) | 85.52% | 0.00% | 89.13% | 0.00% | **+85.52%** |
| 5 | `mxp` (Mixe) | 15.91% | 11.87% | 54.90% | 9.80% | +4.04% |
| 6 | `yaq` (Yaqui) | 1.82% | 0.00% | 3.12% | 0.00% | +1.82% |
| 7 | `poe` (Popoloca) | 6.78% | 3.39% | 62.50% | 12.50% | +3.39% |
| 8 | `zpc` (Zapotec) | 9.43% | 2.05% | 42.11% | 13.16% | +7.38% |
| 9 | `sei` (Seri) | 1.89% | 0.00% | 10.61% | 0.00% | +1.89% |
| 10 | `lac` (Lacandon) | 9.80% | 0.00% | 42.31% | 0.00% | +9.80% |
| 11 | `zao` (Zapotec) | 93.01% | 0.00% | 100.00% | 0.00% | **+93.01%** |
| 12 | `mxt` (Mixtec) | 6.70% | 0.00% | 19.18% | 0.00% | +6.70% |
| 13 | `huv` (San Marcos Huistepec Zapotec) | 1.41% | 0.00% | 10.34% | 0.00% | +1.41% |
| 14 | `tee` (Huehuetla Tepehua) | 3.03% | 0.00% | 17.33% | 0.00% | +3.03% |
| 15 | `tzh` (Tzeltal) | 2.67% | 0.00% | 15.91% | 0.00% | +2.67% |
| 16 | `mto` (Totontepec Mixe) | 93.12% | 32.47% | 100.00% | 39.71% | +60.65% |
| 17 | `amu` (Amuzgo) | 14.96% | 2.36% | 52.46% | 1.64% | +12.60% |
| 18 | `mih` (Chayuco Mixtec) | 3.76% | 0.00% | 9.52% | 0.00% | +3.76% |
| 19 | `zpm` (Mixtec) | 6.98% | 0.00% | 32.73% | 0.00% | +6.98% |
| 20 | `toc` (Tojolabal) | 11.32% | 0.00% | 57.14% | 0.00% | +11.32% |
| — | **AVERAGE** | **17.65%** | **2.03%** | **38.59%** | **3.60%** | **+15.61%** |

### Key Findings

- **Unprecedented Accuracy Gains:** 14 out of 21 languages achieved a fine-tuned CER of **0.00%**, meaning perfect character-level recognition on those samples — a result not seen in v1.0.
- **Hardest Cases Tackled:** Languages like Amuzgo (`amu`) and Zapotec (`zao`, sample 11) started with CERs above 85–93% and were reduced to zero after fine-tuning, representing improvements of over 85 and 93 percentage points respectively.
- **Remaining Challenges:** `mto` (Totontepec Mixe) remains the most difficult language in the set, with a fine-tuned CER of 32.47% — still a 65% relative improvement over its raw baseline, but indicating further work is needed for highly complex orthographies.
- **Word-Level Leap:** WER dropped from 38.59% to just 3.60% — a **34.98 percentage point** absolute improvement, compared to only 7.81pp in v1.0, demonstrating a qualitative leap in the model's ability to reconstruct full word forms in these language families.
- **Robustness:** The model continues to show high resilience against synthetic distortions applied during the data generation phase.
**Tachiwin** (from Totonac - "Language") is dedicated to bridging 
the digital divide for indigenous languages of Mexico through AI technology.

- **Developed by:** Tachiwin
- **License:** apache-2.0
- **Finetuned from model :** PaddlePaddle/PaddleOCR-VL-1.5


This paddleocr_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)