--- language: - en - hi - bn - ta - te - gu - kn - ml - mr - or - pa - ur - as - brx - doi - gom - kas - mai - mni - ne - sa - sat - sd license: apache-2.0 base_model: Qwen/Qwen3-VL-4B-Instruct tags: - vision - multilingual - indic-languages - lora - translation - document-understanding - fine-tuned datasets: - ai4bharat/BPCC - ai4bharat/Pralekha - ai4bharat/indicdlp - lmms-lab/DocVQA pipeline_tag: image-text-to-text --- # Sarvam-1-VL-4B-Instruct - LoRA Adapter ## Model Description Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the **LoRA adapter** that needs to be merged with the base model. ## Training Details - **Base Model:** Qwen/Qwen3-VL-4B-Instruct - **Training Method:** LoRA (Rank 128, Alpha 256) - **Training Steps:** 2,000 - **Training Time:** ~8.9 hours - **Final Loss:** 6.25 - **Effective Batch Size:** 16 ## Datasets Trained on 4 datasets covering: - **Translation** (40%): BPCC - 22 Indic languages ↔ English - **Instruction Following** (20%): Pralekha - 11 language pairs - **Document Layout** (30%): IndicDLP - Document understanding - **Visual QA** (10%): DocVQA - Question answering ## Supported Languages Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English ## Usage ```python from unsloth import FastVisionModel model, tokenizer = FastVisionModel.from_pretrained( "Qwen/Qwen3-VL-4B-Instruct", load_in_4bit=True, ) # Load LoRA adapter model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct") # Use for inference ``` ## License Apache 2.0 ## Citation If you use this model, please cite the original Qwen3-VL paper and the datasets used.