Vokturz's picture
Update README.md
985c9a5 verified
|
Raw
History Blame Contribute Delete
2.31 kB
metadata
library_name: transformers
tags:
  - vision-language
  - ocr
  - multimodal
  - qwen
  - lora
  - instruction-tuning
datasets:
  - Vokturz/sourceforge-app-screenshots-ocr
base_model:
  - unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit

Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Model Details

Model Description

Loyca-Qwen3-VL-2B-Instruct-OCR is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for visual text recognition (OCR) and screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

Model Sources


Uses

This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.

The model is not designed for:

  • Handwritten OCR
  • Scene text in natural environments (e.g., street signs)
  • Legal or financial document processing without human review

Training Details

Training Data

The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.

The dataset focuses on clean UI text extraction rather than general image captioning.

Training Hyperparameters

Parameter Value
Epochs 8
Batch size 8
Learning rate 3e-4
LoRA rank 64
LoRA alpha 64
Precision bfloat16 (mixed)
Optimizer AdamW
Scheduler Cosine decay
Gradient accumulation 2
Weight decay 0.01