Image-Text-to-Text
Transformers
Safetensors
qwen3_vl
ocr
northeast-india
low-resource
vision-language
mizo
garo
khasi
kokborok
nagamese
nyishi
conversational
Instructions to use MWirelabs/kren-vision with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MWirelabs/kren-vision with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="MWirelabs/kren-vision") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("MWirelabs/kren-vision") model = AutoModelForImageTextToText.from_pretrained("MWirelabs/kren-vision") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MWirelabs/kren-vision with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MWirelabs/kren-vision" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MWirelabs/kren-vision", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/MWirelabs/kren-vision
- SGLang
How to use MWirelabs/kren-vision with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MWirelabs/kren-vision" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MWirelabs/kren-vision", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MWirelabs/kren-vision" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MWirelabs/kren-vision", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use MWirelabs/kren-vision with Docker Model Runner:
docker model run hf.co/MWirelabs/kren-vision
Kren Vision
Kren Vision is a fine-tuned vision-language model for optical character recognition (OCR) of Northeast Indian languages. It is part of the Kren AI Stack by MWire Labs, focused on building foundational language technology for Northeast India's indigenous languages.
Built on an open-source vision-language model with LoRA fine-tuning on 618k deduplicated synthetic OCR samples across 6 Latin-script NE languages.
Supported Languages
| Language | Script |
|---|---|
| Mizo | Latin |
| Garo | Latin |
| Khasi | Latin |
| Kokborok | Latin |
| Nagamese | Latin |
| Nyishi | Latin |
Performance
Evaluated on 500 held-out test samples:
| Metric | Score |
|---|---|
| Exact Match | 92.60% |
| CER | 0.85% |
Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from qwen_vl_utils import process_vision_info
import torch
processor = AutoProcessor.from_pretrained("MWirelabs/kren-vision")
model = AutoModelForImageTextToText.from_pretrained(
"MWirelabs/kren-vision",
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "user", "content": [
{"type": "image", "image": "your_image.jpg"},
{"type": "text", "text": "OCR the text in this image."}
]}
]
inputs = processor.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True,
return_dict=True, return_tensors="pt"
).to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=128)
trimmed = [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)]
output = processor.batch_decode(trimmed, skip_special_tokens=True)
print(output[0])
Training
- Data: 618k deduplicated synthetic OCR samples across 6 languages
- Fine-tuning: LoRA (r=16, alpha=32) on vision and language projection layers
- Hardware: NVIDIA RTX 6000 Ada (48GB)
- Epochs: 2
Citation
@misc{kren-vision-2026,
title={Kren Vision: OCR for Northeast Indian Languages},
author={MWire Labs},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/MWirelabs/kren-vision}
}
License
CC-BY-4.0 — MWire Labs, 2026
- Downloads last month
- -
