---
language:
- en
- hi
- bn
- ta
- te
- gu
- kn
- ml
- mr
- or
- pa
- ur
- as
- brx
- doi
- gom
- kas
- mai
- mni
- ne
- sa
- sat
- sd
license: apache-2.0
base_model: Qwen/Qwen3-VL-4B-Instruct
tags:
- vision
- multilingual
- indic-languages
- lora
- translation
- document-understanding
- fine-tuned
datasets:
- ai4bharat/BPCC
- ai4bharat/Pralekha
- ai4bharat/indicdlp
- lmms-lab/DocVQA
pipeline_tag: image-text-to-text
---

# Sarvam-1-VL-4B-Instruct - LoRA Adapter

## Model Description

Fine-tuned vision-language model for Indic languages based on Qwen3-VL-4B-Instruct. This is the **LoRA adapter** that needs to be merged with the base model.

## Training Details

- **Base Model:** Qwen/Qwen3-VL-4B-Instruct
- **Training Method:** LoRA (Rank 128, Alpha 256)
- **Training Steps:** 2,000
- **Training Time:** ~8.9 hours
- **Final Loss:** 6.25
- **Effective Batch Size:** 16

## Datasets

Trained on 4 datasets covering:
- **Translation** (40%): BPCC - 22 Indic languages ↔ English
- **Instruction Following** (20%): Pralekha - 11 language pairs
- **Document Layout** (30%): IndicDLP - Document understanding
- **Visual QA** (10%): DocVQA - Question answering

## Supported Languages

Assamese, Bengali, Bodo, Dogri, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Marathi, Manipuri, Nepali, Odia, Punjabi, Sanskrit, Santali, Sindhi, Tamil, Telugu, Urdu, English

## Usage

```python
from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "Qwen/Qwen3-VL-4B-Instruct",
    load_in_4bit=True,
)

# Load LoRA adapter
model.load_adapter("mashriram/Sarvam-1-VL-4B-Instruct")

# Use for inference
```

## License

Apache 2.0

## Citation

If you use this model, please cite the original Qwen3-VL paper and the datasets used.