---
library_name: transformers
tags:
- vision-language
- ocr
- multimodal
- qwen
- lora
- instruction-tuning
datasets:
- Vokturz/sourceforge-app-screenshots-ocr
base_model:
- unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit
---

# Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

## Model Details

### Model Description

**Loyca-Qwen3-VL-2B-Instruct-OCR** is a lightweight LoRA adapter built on top of **Qwen/Qwen3-VL-2B-Instruct**, fine-tuned for **visual text recognition (OCR)** and **screen content understanding**.  
It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

### Model Sources

- **Repository:** [https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR](https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR)  
- **Base model:** [https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct)  
- **Fine-tuning run:** [W&B Experiment](https://wandb.ai/vokturz/Loyca-Qwen3-VL-2B-OCR)  

---

## Uses

This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.  

The model is **not designed** for:

* Handwritten OCR
* Scene text in natural environments (e.g., street signs)
* Legal or financial document processing without human review

---

## Training Details

### Training Data

The model was trained on **`Vokturz/sourceforge-app-screenshots-ocr`** (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.

The dataset focuses on **clean UI text extraction** rather than general image captioning.

### Training Hyperparameters

| Parameter             | Value            |
| --------------------- | ---------------- |
| Epochs                | 8                |
| Batch size            | 8                |
| Learning rate         | 3e-4             |
| LoRA rank             | 64               |
| LoRA alpha            | 64               |
| Precision             | bfloat16 (mixed) |
| Optimizer             | AdamW            |
| Scheduler             | Cosine decay     |
| Gradient accumulation | 2                |
| Weight decay          | 0.01             |