Update README.md

985c9a5 verified 8 months ago

2.31 kB

library_name: transformers
tags:
  - vision-language
  - ocr
  - multimodal
  - qwen
  - lora
  - instruction-tuning
datasets:
  - Vokturz/sourceforge-app-screenshots-ocr
base_model:
  - unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit

Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

Model Details

Model Description

Loyca-Qwen3-VL-2B-Instruct-OCR is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for visual text recognition (OCR) and screen content understanding.
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

Model Sources

Repository: https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR
Base model: https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
Fine-tuning run: W&B Experiment

Uses

This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.

The model is not designed for:

Handwritten OCR
Scene text in natural environments (e.g., street signs)
Legal or financial document processing without human review

Training Details

Training Data

The model was trained on Vokturz/sourceforge-app-screenshots-ocr (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.

The dataset focuses on clean UI text extraction rather than general image captioning.

Training Hyperparameters

Parameter	Value
Epochs	8
Batch size	8
Learning rate	3e-4
LoRA rank	64
LoRA alpha	64
Precision	bfloat16 (mixed)
Optimizer	AdamW
Scheduler	Cosine decay
Gradient accumulation	2
Weight decay	0.01