--- library_name: transformers tags: - vision-language - ocr - multimodal - qwen - lora - instruction-tuning datasets: - Vokturz/sourceforge-app-screenshots-ocr base_model: - unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit --- # Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR ## Model Details ### Model Description **Loyca-Qwen3-VL-2B-Instruct-OCR** is a lightweight LoRA adapter built on top of **Qwen/Qwen3-VL-2B-Instruct**, fine-tuned for **visual text recognition (OCR)** and **screen content understanding**. It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs. ### Model Sources - **Repository:** [https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR](https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR) - **Base model:** [https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct) - **Fine-tuning run:** [W&B Experiment](https://wandb.ai/vokturz/Loyca-Qwen3-VL-2B-OCR) --- ## Uses This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews. The model is **not designed** for: * Handwritten OCR * Scene text in natural environments (e.g., street signs) * Legal or financial document processing without human review --- ## Training Details ### Training Data The model was trained on **`Vokturz/sourceforge-app-screenshots-ocr`** (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements. The dataset focuses on **clean UI text extraction** rather than general image captioning. ### Training Hyperparameters | Parameter | Value | | --------------------- | ---------------- | | Epochs | 8 | | Batch size | 8 | | Learning rate | 3e-4 | | LoRA rank | 64 | | LoRA alpha | 64 | | Precision | bfloat16 (mixed) | | Optimizer | AdamW | | Scheduler | Cosine decay | | Gradient accumulation | 2 | | Weight decay | 0.01 |