Vokturz
/

Loyca-Qwen3-VL-2B-Instruct-OCR

vision-language

instruction-tuning

Model card Files Files and versions

Loyca-Qwen3-VL-2B-Instruct-OCR / README.md

Vokturz's picture

Update README.md

985c9a5 verified 8 months ago

|

History Blame Contribute Delete

2.31 kB

	---
	library_name: transformers
	tags:
	- vision-language
	- ocr
	- multimodal
	- qwen
	- lora
	- instruction-tuning
	datasets:
	- Vokturz/sourceforge-app-screenshots-ocr
	base_model:
	- unsloth/Qwen3-VL-2B-Instruct-unsloth-bnb-4bit
	---

	# Model Card for Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR

	## Model Details

	### Model Description

	Loyca-Qwen3-VL-2B-Instruct-OCR is a lightweight LoRA adapter built on top of Qwen/Qwen3-VL-2B-Instruct, fine-tuned for visual text recognition (OCR) and screen content understanding.
	It enhances the base model’s ability to read and interpret text embedded in images — particularly screenshots and user interfaces — and respond with structured, instruction-following outputs.

	### Model Sources

	- Repository: [https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR](https://huggingface.co/Vokturz/Loyca-Qwen3-VL-2B-Instruct-OCR)
	- Base model: [https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct)
	- Fine-tuning run: [W&B Experiment](https://wandb.ai/vokturz/Loyca-Qwen3-VL-2B-OCR)

	---

	## Uses

	This model can be used directly for Optical Character Recognition (OCR) on screenshots, UI layouts, or application previews.

	The model is not designed for:

	* Handwritten OCR
	* Scene text in natural environments (e.g., street signs)
	* Legal or financial document processing without human review

	---

	## Training Details

	### Training Data

	The model was trained on `Vokturz/sourceforge-app-screenshots-ocr` (~1100 records), a custom dataset of annotated application screenshots containing readable text and UI elements.

	The dataset focuses on clean UI text extraction rather than general image captioning.

	### Training Hyperparameters

	\| Parameter \| Value \|
	\| --------------------- \| ---------------- \|
	\| Epochs \| 8 \|
	\| Batch size \| 8 \|
	\| Learning rate \| 3e-4 \|
	\| LoRA rank \| 64 \|
	\| LoRA alpha \| 64 \|
	\| Precision \| bfloat16 (mixed) \|
	\| Optimizer \| AdamW \|
	\| Scheduler \| Cosine decay \|
	\| Gradient accumulation \| 2 \|
	\| Weight decay \| 0.01 \|