MiniCPM-V 4.6 — Indian Invoice Extraction (Merged)

Fine-tuned openbmb/MiniCPM-V-4.6 for structured JSON extraction from Indian distributor (kirana) invoices.

QLoRA adapter weights are fully merged into the base model — no PEFT dependency at inference time. Part of the Kirana Detective project: a six-agent AI pipeline that audits invoices for pricing anomalies, missing deliveries, and GST errors.

Model Details

Attribute	Value
Base model	openbmb/MiniCPM-V-4.6
Task	Vision-language OCR + structured JSON extraction
Fine-tuning method	QLoRA — 4-bit NF4 base, LoRA rank 16, α 32
Trainable parameters	9,486,336 / 1,309,914,352 (0.72%)
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Training epochs	3
Final eval loss	0.2120 (↓ from 0.2901 at epoch 1)
Training hardware	NVIDIA A10G 22 GB VRAM (Modal)
Training duration	~52 minutes
Output format	Merged full weights — bfloat16
Inference runtime	`transformers` (`AutoModel` + `model.chat()`)

Training Data

Dataset: build-small-hackathon/kirana-invoice-train-data

Split	Examples
Train	450
Eval	50

Synthetic Indian distributor invoices generated with Pillow across:

10 suppliers: HUL, Nestlé, Parle, Britannia, ITC, Amul, Dabur, Marico, Emami, Godrej
4 invoice formats: Printed GST bill, Tally PDF export, handwritten, WhatsApp screenshot
Intentional errors injected: GST rate mismatches, duplicate line items, price spikes — to train the model to surface extraction warnings alongside extracted data

Training Metrics

Epoch	Train Loss	Eval Loss
1	—	0.2901
2	—	0.2281
3	—	0.2120

Supported Input Formats

Format	Example
Printed GST invoice	Standard B2B tax invoice with HSN codes
Tally PDF export	Machine-generated tabular layout
Handwritten invoice	Photo of handwritten bill
WhatsApp screenshot	Low-resolution forwarded invoice image

Output Schema

The model returns only a JSON object matching this schema — no markdown, no prose:

{
  "invoice_number": "INV-2024-001",
  "supplier": "Hindustan Unilever Ltd.",
  "date": "2026-06-10",
  "items": [
    {
      "product_raw": "SURF XL 1KG",
      "quantity": 12,
      "unit_price": 95.00,
      "gst_rate": 18,
      "line_total": 1140.00
    },
    {
      "product_raw": "MAGGI MASALA 70G",
      "quantity": 48,
      "unit_price": 14.00,
      "gst_rate": 5,
      "line_total": 672.00
    }
  ],
  "grand_total": 9650.00,
  "extraction_warnings": []
}

Field notes:

product_raw — verbatim as printed on the invoice (abbreviations, typos preserved)
gst_rate — percentage value (5, 12, 18, 28), not a decimal
date — ISO 8601 (YYYY-MM-DD) when parseable, raw string otherwise
extraction_warnings — list of issues noticed (missing fields, illegible areas, GST anomalies)
Numeric fields default to 0 when unreadable; invoice_number/supplier/date default to null

Usage

Basic Inference

import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

model = AutoModel.from_pretrained(
    "naazimsnh02/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "naazimsnh02/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
)

image = Image.open("invoice.jpg").convert("RGB")

prompt = (
    "You are an OCR agent for Indian kirana store invoices. "
    "Extract all information from this invoice image and return ONLY valid JSON "
    "matching this schema exactly:\n"
    '{"invoice_number": string|null, "supplier": string|null, "date": string|null, '
    '"items": [{"product_raw": string, "quantity": number, "unit_price": number, '
    '"gst_rate": number, "line_total": number}], '
    '"grand_total": number, "extraction_warnings": [string]}\n'
    "Return ONLY the JSON object, no markdown, no prose."
)

msgs = [{"role": "user", "content": [image, prompt]}]
response = model.chat(image=None, msgs=msgs, tokenizer=tokenizer, sampling=False, max_new_tokens=2048)
print(response)

From a PDF (multi-page)

import fitz  # PyMuPDF
from PIL import Image
import io, json

doc = fitz.open("invoice.pdf")
results = []
for page in doc:
    pix = page.get_pixmap(matrix=fitz.Matrix(2.0, 2.0))
    img = Image.open(io.BytesIO(pix.tobytes("png"))).convert("RGB")
    msgs = [{"role": "user", "content": [img, prompt]}]
    raw = model.chat(image=None, msgs=msgs, tokenizer=tokenizer, sampling=False, max_new_tokens=2048)
    results.append(json.loads(raw))

Limitations

Trained on synthetic invoices only — real-world performance may vary on heavily degraded, stamped, or non-standard layouts until production data is collected.
Optimised for English and numeric invoice content; Hindi/regional-language invoices are not yet covered.
Product names are extracted verbatim (product_raw) — normalization to canonical SKU names is handled downstream by the MiniCPM5-1B normalizer agent.
grand_total extraction can fail on invoices with complex multi-page subtotal structures.

Citation

@misc{kirana_detective_minicpmv_2026,
  author    = {Syed Naazim Hussain},
  title     = {MiniCPM-V 4.6 Fine-Tuned for Indian Invoice Extraction},
  year      = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/naazimsnh02/minicpm-v-4-6-indian-invoice-extraction-merged}},
}

License

Apache 2.0 — same license as the base openbmb/MiniCPM-V-4.6 model.

Downloads last month: 33

Safetensors

Model size

1B params

Tensor type

F16

Model tree for build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged

Base model

openbmb/MiniCPM-V-4.6

Finetuned

(15)

this model

Dataset used to train build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged

Space using build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged 1

Collection including build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged

Kirana Detective

Collection

6 items • Updated 3 days ago

Article mentioning build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged

How I Built an AI Auditor for India's 12 Million Kirana Stores

build-small-hackathon

•

3 days ago