MODEL CARD: Kirana Detective Training Data & Fine-Tuned Models
Repository: build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged
Author: naazimsnh02
License: Apache 2.0 (models) / MIT (code)
Last Updated: June 10, 2026
Executive Summary
Kirana Detective is a complete fine-tuning pipeline for three state-of-the-art models that audit distributor invoices for Indian kirana (grocery) stores. This repository contains:
- Synthetic invoice generation (500 images across 4 formats)
- Fine-tuned MiniCPM-V 4.6 β Invoice OCR & extraction (transformers, merged weights)
- Fine-tuned MiniCPM5-1B β Product name normalization (GGUF)
- Fine-tuned YOLO26n β Visual product detection (ONNX)
All models run locally without cloud APIs and are deployed in a six-agent pipeline to detect pricing anomalies, missing deliveries, and GST errors, reporting estimated rupee leakage with actionable corrections.
Project Overview
Problem Statement
Indian kirana store owners struggle to audit distributor invoices manually:
- Inconsistent product naming (abbreviations, typos, regional variants)
- Difficulty cross-referencing against inventory
- Manual photo counting is error-prone
- No standardized format for pricing lookups
- Estimated financial leakage: 5β15% of purchase budget
Solution
Kirana Detective automates the entire audit pipeline:
- Extract line items from invoice images (MiniCPM-V)
- Normalize product names (MiniCPM5-1B)
- Check prices against catalog
- Count inventory from delivery photos (YOLO26n)
- Reconcile invoiced vs. counted quantities
- Report discrepancies with rupee impact
Models in This Repository
Model 1: MiniCPM-V 4.6 (Invoice Extractor)
| Attribute | Details |
|---|---|
| Base Model | openbmb/MiniCPM-V-4.6 |
| Task | Vision-language OCR + structured extraction |
| Fine-tuning Method | QLoRA (4-bit quantization + LoRA rank 16) |
| Training Data | 500 synthetic invoices (450 train, 50 eval) |
| Trainable Parameters | 9,486,336 / 1,309,914,352 (0.72%) |
| Output Format | Merged full weights (bfloat16) |
| Inference Runtime | Transformers (AutoModel, model.chat()) |
| Hardware (Training) | NVIDIA A10G, 22 GB VRAM, 51 min 50 sec (actual) |
| Repository | build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged |
Input Formats Supported:
- Printed GST invoices (Pillow-generated PDFs)
- Tally PDF exports
- Handwritten invoices (photos)
- WhatsApp screenshot invoices
Output Structure (JSON):
{
"supplier": "Distributor Name",
"invoice_number": "INV-001",
"line_items": [
{
"raw_name": "MAGGI NDL 70GM",
"quantity": 10,
"unit_price": 45.50,
"gst_rate": 5,
"total": 455.00
}
],
"invoice_total": 9650.00,
"gst_total": 485.00
}
Model 2: MiniCPM5-1B (Product Name Normalizer)
| Attribute | Details |
|---|---|
| Base Model | openbmb/MiniCPM5-1B |
| Task | Text-to-text product name normalization |
| Fine-tuning Method | QLoRA (4-bit base, LoRA rank 16) |
| Training Data | 2,000 synthetic (raw, canonical) pairs (1,800 train, 200 eval) |
| Output Format | GGUF (quantized, ~1.2 GB) |
| Framework | Unsloth 2026.6.1 |
| Hardware (Training) | NVIDIA A10G, 22 GB VRAM, ~1 hour |
| Repository | build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer |
Example Mappings:
| Raw Input | Normalized Output |
|---|---|
MAGGI NDL 70GM |
Nestle Maggi Masala Noodles 70g |
SURF XL 1K |
Surf Excel Washing Powder 1kg |
AMUL BTR 100 |
Amul Butter 100g |
COLGAT 100G |
Colgate Strong Teeth Toothpaste 100g |
Training Data:
- Hand-curated catalog of 200 Indian FMCG SKUs
- Augmentation strategies: abbreviation expansion, typo injection, truncation, regional shorthand
- Covers 10 major distributors: ITC, NestlΓ©, Unilever, P&G, Reckitt, Britannia, Amul, Patanjali, etc.
Model 3: YOLO26n (Product Detection)
| Attribute | Details |
|---|---|
| Base Model | YOLOv8 Nano |
| Task | Object detection (product localization & counting) |
| Fine-tuning Method | Supervised fine-tuning via Ultralytics |
| Training Data | 3 Roboflow datasets merged (~11,400 images) |
| Output Format | ONNX (opset 12, ~15 MB) + PyTorch checkpoint |
| Framework | Ultralytics YOLOv8 |
| Hardware (Training) | NVIDIA A10G, 100 epochs (60 + 40 resumed after restart) |
| Repository | build-small-hackathon/yolo26n-indian-fmcg-detection |
Classes: Unified class list built dynamically by merging all three dataset vocabularies (insertion-order dedup). The merged dataset spans 1,831 classes across grocery staples, personal care, beverages, and packaged foods. Full list in class_names.json.
Evaluation (merged 3-dataset run β final):
| Metric | Value |
|---|---|
| mAP50 (all classes) | 0.428 |
| mAP50-95 (all classes) | 0.302 |
| Total classes | 1,831 |
| Validation images | 1,236 |
| Validation instances | 13,443 |
Pilot run note (superseded): A prior single-dataset run (agentsk47 only, 10 classes) achieved mAP@50 = 0.993 / mAP@50-95 = 0.933. Those metrics do not apply to the full merged model.
Datasets Merged:
- agentsk47/indian-grocery-object-detection β v1, ~400 images, 10 classes
- iit-patna/grocery_items β v45, 6,695 images, 20 classes
- project-c5ho0/indian-market β v2, 4,694 images, 2 classes
Training Data & Datasets
Synthetic Invoice Generation (generate_invoices.py)
Purpose: Create diverse, realistic invoice images without requiring manual collection or OCR labor.
Configuration:
- 500 total invoices generated
- 4 formats: GST invoices, Tally PDFs, handwritten samples, WhatsApp screenshots
- Pure Pillow (no native dependencies)
- Randomized supplier names, quantities, prices, and GST rates
Generated Data Structure:
data/synthetic_invoices/
βββ annotations.jsonl # JSONL: {image_path, extracted_data}
βββ printed_gst/ # 125 GST-compliant invoices
βββ tally_pdf/ # 125 Tally PDF exports
βββ handwritten/ # 125 handwritten photos
βββ whatsapp/ # 125 WhatsApp screenshots
Each invoice includes:
- 5β20 line items
- Realistic pricing (βΉ10ββΉ5,000 per item)
- Correct GST calculations (5%, 12%, 18%)
- Real supplier names + product abbreviations
Quick Start
Installation
git clone https://github.com/naazimsnh02/kirana-detective.git
cd kirana-detective
pip install -r requirements.txt
Run Fine-tuning on Modal
# Set environment variables
export ROBOFLOW_API_KEY=<your-roboflow-api-key>
export HF_TOKEN=<your-huggingface-token>
modal token new
# Generate synthetic invoices
modal run finetune/generate_invoices.py
# Fine-tune all three models (sequential)
modal run finetune/train_minicpm_v.py # ~2 hours
modal run finetune/train_minicpm5_1b.py # ~1 hour
modal run finetune/train_yolo26n.py # ~2 hours
Models are auto-published to HuggingFace Hub upon completion.
Local Inference
MiniCPM-V (Invoice Extraction):
llama-cli --model minicpm-v-4-6.gguf \
-p "<|im_start|>system\nExtract invoice data<|im_end|>\n..." \
--image invoice.png
MiniCPM5-1B (Product Normalization):
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
"build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer"
)
YOLO26n (Object Detection):
from ultralytics import YOLO
model = YOLO("yolo26n_fmcg.onnx")
results = model.predict("shelf.jpg", imgsz=640)
Evaluation & Performance
MiniCPM-V Training Metrics (Actual Run β June 10, 2026)
| Epoch | Train Loss | Eval Loss | LR |
|---|---|---|---|
| 1 | 6.081 | 0.2901 | 8.83e-5 |
| 2 | 3.948 | 0.2281 | 4.94e-5 |
| 3 | 3.326 | 0.212 | 1.04e-5 |
- Training time: 51 min 50 sec (87 steps, 26 s/step on A10G)
- Avg gradient norm: 178 β 16 (stable convergence)
- Best checkpoint loaded: epoch 3 (eval loss 0.212)
- Final avg train loss across all steps: 4.774
Per-invoice-type breakdown (printed GST / Tally / handwritten / WhatsApp) pending a held-out real-invoice test set β to be added in Phase 2.
MiniCPM5-1B Evaluation
| Metric | Value |
|---|---|
| Exact Match (normalized names) | 94.5% |
| Fuzzy Match (Levenshtein > 0.8) | 98.2% |
| OOV Handling | 3.8% fail β manual review flag |
YOLO26n Evaluation β Final Merged Run (3 datasets, 1,831 classes)
| Metric | Value |
|---|---|
| mAP50 (all classes) | 0.428 |
| mAP50-95 (all classes) | 0.302 |
| Total classes | 1,831 |
| Validation images | 1,236 |
| Validation instances | 13,443 |
| Best epoch | 100 (60 initial + 40 resumed) |
YOLO26n β Pilot Run Per-Class Metrics (single dataset, 10 classes, superseded)
These metrics are from an earlier run on the
agentsk47dataset only. Shown for reference; the production model uses all 3 merged datasets.
Per-class metrics at best epoch (65):
| Class | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|
| Bournvita | 0.902 | 1.000 | 0.995 | 0.995 |
| Mysore Sandal Soap | 1.000 | 0.905 | 0.995 | 0.944 |
| Nescafe Coffee | 0.927 | 1.000 | 0.995 | 0.908 |
| Nivea Body Lotion | 0.935 | 1.000 | 0.995 | 0.923 |
| Nivea Soft Cream | 0.924 | 1.000 | 0.995 | 0.895 |
| Parachute Coconut Oil | 1.000 | 0.819 | 0.972 | 0.928 |
| Patanjali Dant Kanti | 1.000 | 0.985 | 0.995 | 0.971 |
| Society Tea | 0.878 | 1.000 | 0.995 | 0.845 |
| TresemmΓ© Conditioner | 0.814 | 1.000 | 0.995 | 0.995 |
| TresemmΓ© Shampoo | 0.968 | 1.000 | 0.995 | 0.922 |
| Macro Average | 0.935 | 0.971 | 0.993 | 0.933 |
Known Limitations & Biases
MiniCPM-V (Invoice Extractor)
| Limitation | Impact | Mitigation |
|---|---|---|
| Only 10 FMCG suppliers in training data | Fails on uncommon distributors (e.g., local regional suppliers) | Collect real invoices from more suppliers post-hackathon |
| Synthetic data (no image degradation, blur) | May struggle with poor-quality photos | Add augmentation (blur, noise, shadows) to training data |
| GST rates hardcoded (5%, 12%, 18%) | Misses 0% or 28% GST items | Parameterize GST rate extraction |
| English-only prompts | Cannot process invoices in regional languages | Add Hindi/Tamil/Marathi templates |
MiniCPM5-1B (Product Normalizer)
| Limitation | Impact | Mitigation |
|---|---|---|
| Synthetic augmentation only | Overfits to rule-based patterns; fails on real-world typos | Collect 200+ real invoices for retraining |
| 200 SKU catalog | Fails on brands outside top 10 suppliers | Expand to 2,000 SKUs (all major Indian FMCG) |
| No regional abbreviations | Tamil/Hindi shortcuts not recognized | Add language-specific abbreviation models |
| No OEM rebrands | Misses store-brand relabeling | Add rebranding patterns post-research |
YOLO26n (Product Detection)
| Limitation | Impact | Mitigation |
|---|---|---|
| Merged dataset skewed toward beauty/personal care (TresemmΓ©, Nivea, Patanjali) | May underperform on grocery staples (oils, spices, pulses) | Balance class distribution; add 40β50 grocery categories |
| ~11K images across 3 datasets | May not generalize to unlisted brands or novel shelf layouts | Collect 50K+ images via Roboflow community |
| Confidence threshold (0.25) tuned for this dataset | May produce false positives in novel environments | Benchmark on held-out kirana store photos |
| YOLO26n is 8M params (nano) | Edge device deployment not yet tested | Quantize & benchmark on RPi 4, Android |
Fairness & Bias Notes
- Brand bias: Training data skews toward premium Indian brands (Amul, NestlΓ©, ITC) β may underperform on budget/regional brands
- Supplier bias: Only 10 distributors represented; regional cooperatives not included
- Language bias: All training prompts in English; non-English invoices will fail
- Income bias: Kirana store size assumption (βΉ5β50 lakh inventory) β very large or very small stores may see degraded performance
Reproducibility
Seed Control
All scripts use fixed seeds:
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
Roboflow Dataset Versions (Pinned)
- agentsk47/indian-grocery-object-detection β v1 (May 2025)
- iit-patna/grocery_items β v45 (Apr 2026)
- project-c5ho0/indian-market β v2 (Jun 2025)
Training Infrastructure
- Orchestration: Modal (serverless GPUs)
- Fine-tuning Framework: Unsloth 2026.6.1 (LLM), Ultralytics (YOLO)
- Quantization: llama.cpp (GGUF)
- Model Publishing: HuggingFace Hub
huggingface_hub>=0.30.0
Reproducibility Checklist
- Dataset versions pinned in code
- Random seeds fixed
- Hardware specs documented (A10G, 22 GB VRAM)
- Training duration recorded (~5 hours total)
- Evaluation metrics logged post-training
- Cold start (fresh HF account) validation (TODO: test on new account)
Files in This Repository
kirana-invoice-train-data/
βββ README.md # This file
βββ MODEL_CARD.md # Model card for HF Hub
βββ requirements.txt # Python dependencies
β
βββ finetune/
β βββ README.md # Training workflow guide
β βββ generate_invoices.py # Synthetic invoice generator (500 images)
β βββ train_minicpm_v.py # Fine-tune MiniCPM-V (OCR)
β βββ train_minicpm5_1b.py # Fine-tune MiniCPM5-1B (normalizer)
β βββ train_yolo26n.py # Fine-tune YOLO26n (detection)
β βββ export_minicpm_v_gguf.py # Merge LoRA β push merged HF weights
β βββ push_minicpm_v_to_hf.py # Push MiniCPM-V LoRA adapter to HF Hub
β βββ push_minicpm_v_merged_card.py # Update MiniCPM-V merged model card on HF
β βββ push_yolo_to_hf.py # Push YOLO artifacts from Modal volume to HF
β βββ upload_yolo_to_hf.py # Upload YOLO artifacts from local disk to HF
β
βββ data/
β βββ fmcg_catalog.json # 200 canonical SKU names + GST rates
β βββ synthetic_invoices/
β βββ annotations.jsonl
β βββ printed_gst/ # 125 invoices
β βββ tally_pdf/ # 125 invoices
β βββ handwritten/ # 125 invoices
β βββ whatsapp/ # 125 invoices
β
βββ tests/
βββ test_*.py # Unit & integration tests
Hardware & Cost Estimates
Training Cost (Modal On-Demand)
| Model | GPU | Duration | On-Demand Cost |
|---|---|---|---|
| MiniCPM-V | NVIDIA A10G | ~52 min (actual) | ~$1.30 |
| MiniCPM5-1B | NVIDIA A10G | ~1 hour | $1.50 |
| YOLO26n | NVIDIA A10G | ~2 hours | $3.00 |
| Total | β | ~3h 52min (actual) | ~$5.80 |
Inference Hardware
- Laptop CPU (Intel i7): ~5β10 sec/invoice (MiniCPM-V) + ~2 sec/normalization + ~3 sec/image (YOLO)
- GPU (NVIDIA RTX 3080): ~0.5 sec/invoice + ~0.2 sec/normalization + ~0.1 sec/image
- Edge Device (Raspberry Pi 4): YOLO26n quantized to Q2_K β 30β60 sec/image (untested)
Usage in Production (Kirana Detective App)
Models are downloaded on first run via:
import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image
# Merged weights β no PEFT required
model = AutoModel.from_pretrained(
"build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
"build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
trust_remote_code=True,
)
# Inference
image = Image.open("invoice.jpg")
msgs = [{"role": "user", "content": [image, "Extract all line items as JSON."]}]
response = model.chat(image=None, msgs=msgs, tokenizer=tokenizer, sampling=False, max_new_tokens=2048)
Next Steps & Roadmap
Phase 2 (Q3 2026)
- Collect 500 real invoices from partnered kirana stores
- Expand product taxonomy: 200 SKUs β 2,000 SKUs
- Add regional language support (Hindi, Tamil, Marathi, Kannada)
- Fine-tune on invoice degradation (blur, folds, stains)
- Benchmark on edge devices (Raspberry Pi, Android)
Phase 3 (Q4 2026)
- Multi-language MiniCPM5-1B normalizer
- Expand YOLO26n to 50β100 classes (full grocery taxonomy)
- Real-time video product counting via YOLO
- Mobile app (React Native) with offline inference
Research Questions
- How do models perform on store-private labels vs. branded products?
- Can we detect counterfeit products via label anomalies?
- What is the fairness gap for regional vs. national brands?
Licensing & Attribution
- Code: MIT License
- Models:
- MiniCPM-V: openbmb/MiniCPM-V β Apache 2.0
- MiniCPM5-1B: openbmb/MiniCPM5-1B β Apache 2.0
- YOLO26n: Ultralytics YOLOv8 β AGPL-3.0
- Datasets:
- Roboflow datasets: Individual licenses (CC BY 4.0, CC BY-SA 4.0) β check each repo
- Synthetic invoices: CC0 (public domain)
Contributing
Contributions welcome! Areas of need:
- Real invoice collection: Partner kirana stores to share anonymized invoices
- Regional language templates: Hindi, Tamil, Marathi invoice formats
- Edge device benchmarks: Profile inference on RPi 4, Snapdragon, etc.
- Dataset expansion: Add 1,000+ more products to YOLO26n training
- Fairness audits: Test models on regional/budget brands
Contact & Support
- Author: naazimsnh02
- Issues: GitHub Issues
- HF Hub Models:
Citation
If you use this repository or models in your work, please cite:
@misc{kirana_detective_2026,
author = {Hussain, Syed Naazim},
title = {Kirana Detective: Fine-Tuned Models for Indian Grocery Invoice Auditing},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged}},
}
Version: 1.0
Last Updated: June 10, 2026