kirana-detective / MODEL_CARD.md
naazimsnh02's picture
Fix documentation
3b757a5
|
Raw
History Blame
19.9 kB

MODEL CARD: Kirana Detective Training Data & Fine-Tuned Models

Repository: build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged
Author: naazimsnh02
License: Apache 2.0 (models) / MIT (code)
Last Updated: June 10, 2026


Executive Summary

Kirana Detective is a complete fine-tuning pipeline for three state-of-the-art models that audit distributor invoices for Indian kirana (grocery) stores. This repository contains:

  1. Synthetic invoice generation (500 images across 4 formats)
  2. Fine-tuned MiniCPM-V 4.6 β€” Invoice OCR & extraction (transformers, merged weights)
  3. Fine-tuned MiniCPM5-1B β€” Product name normalization (GGUF)
  4. Fine-tuned YOLO26n β€” Visual product detection (ONNX)

All models run locally without cloud APIs and are deployed in a six-agent pipeline to detect pricing anomalies, missing deliveries, and GST errors, reporting estimated rupee leakage with actionable corrections.


Project Overview

Problem Statement

Indian kirana store owners struggle to audit distributor invoices manually:

  • Inconsistent product naming (abbreviations, typos, regional variants)
  • Difficulty cross-referencing against inventory
  • Manual photo counting is error-prone
  • No standardized format for pricing lookups
  • Estimated financial leakage: 5–15% of purchase budget

Solution

Kirana Detective automates the entire audit pipeline:

  1. Extract line items from invoice images (MiniCPM-V)
  2. Normalize product names (MiniCPM5-1B)
  3. Check prices against catalog
  4. Count inventory from delivery photos (YOLO26n)
  5. Reconcile invoiced vs. counted quantities
  6. Report discrepancies with rupee impact

Models in This Repository

Model 1: MiniCPM-V 4.6 (Invoice Extractor)

Attribute Details
Base Model openbmb/MiniCPM-V-4.6
Task Vision-language OCR + structured extraction
Fine-tuning Method QLoRA (4-bit quantization + LoRA rank 16)
Training Data 500 synthetic invoices (450 train, 50 eval)
Trainable Parameters 9,486,336 / 1,309,914,352 (0.72%)
Output Format Merged full weights (bfloat16)
Inference Runtime Transformers (AutoModel, model.chat())
Hardware (Training) NVIDIA A10G, 22 GB VRAM, 51 min 50 sec (actual)
Repository build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged

Input Formats Supported:

  • Printed GST invoices (Pillow-generated PDFs)
  • Tally PDF exports
  • Handwritten invoices (photos)
  • WhatsApp screenshot invoices

Output Structure (JSON):

{
  "supplier": "Distributor Name",
  "invoice_number": "INV-001",
  "line_items": [
    {
      "raw_name": "MAGGI NDL 70GM",
      "quantity": 10,
      "unit_price": 45.50,
      "gst_rate": 5,
      "total": 455.00
    }
  ],
  "invoice_total": 9650.00,
  "gst_total": 485.00
}

Model 2: MiniCPM5-1B (Product Name Normalizer)

Attribute Details
Base Model openbmb/MiniCPM5-1B
Task Text-to-text product name normalization
Fine-tuning Method QLoRA (4-bit base, LoRA rank 16)
Training Data 2,000 synthetic (raw, canonical) pairs (1,800 train, 200 eval)
Output Format GGUF (quantized, ~1.2 GB)
Framework Unsloth 2026.6.1
Hardware (Training) NVIDIA A10G, 22 GB VRAM, ~1 hour
Repository build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer

Example Mappings:

Raw Input Normalized Output
MAGGI NDL 70GM Nestle Maggi Masala Noodles 70g
SURF XL 1K Surf Excel Washing Powder 1kg
AMUL BTR 100 Amul Butter 100g
COLGAT 100G Colgate Strong Teeth Toothpaste 100g

Training Data:

  • Hand-curated catalog of 200 Indian FMCG SKUs
  • Augmentation strategies: abbreviation expansion, typo injection, truncation, regional shorthand
  • Covers 10 major distributors: ITC, NestlΓ©, Unilever, P&G, Reckitt, Britannia, Amul, Patanjali, etc.

Model 3: YOLO26n (Product Detection)

Attribute Details
Base Model YOLOv8 Nano
Task Object detection (product localization & counting)
Fine-tuning Method Supervised fine-tuning via Ultralytics
Training Data 3 Roboflow datasets merged (~11,400 images)
Output Format ONNX (opset 12, ~15 MB) + PyTorch checkpoint
Framework Ultralytics YOLOv8
Hardware (Training) NVIDIA A10G, 100 epochs (60 + 40 resumed after restart)
Repository build-small-hackathon/yolo26n-indian-fmcg-detection

Classes: Unified class list built dynamically by merging all three dataset vocabularies (insertion-order dedup). The merged dataset spans 1,831 classes across grocery staples, personal care, beverages, and packaged foods. Full list in class_names.json.

Evaluation (merged 3-dataset run β€” final):

Metric Value
mAP50 (all classes) 0.428
mAP50-95 (all classes) 0.302
Total classes 1,831
Validation images 1,236
Validation instances 13,443

Pilot run note (superseded): A prior single-dataset run (agentsk47 only, 10 classes) achieved mAP@50 = 0.993 / mAP@50-95 = 0.933. Those metrics do not apply to the full merged model.

Datasets Merged:

  1. agentsk47/indian-grocery-object-detection β€” v1, ~400 images, 10 classes
  2. iit-patna/grocery_items β€” v45, 6,695 images, 20 classes
  3. project-c5ho0/indian-market β€” v2, 4,694 images, 2 classes

Training Data & Datasets

Synthetic Invoice Generation (generate_invoices.py)

Purpose: Create diverse, realistic invoice images without requiring manual collection or OCR labor.

Configuration:

  • 500 total invoices generated
  • 4 formats: GST invoices, Tally PDFs, handwritten samples, WhatsApp screenshots
  • Pure Pillow (no native dependencies)
  • Randomized supplier names, quantities, prices, and GST rates

Generated Data Structure:

data/synthetic_invoices/
β”œβ”€β”€ annotations.jsonl          # JSONL: {image_path, extracted_data}
β”œβ”€β”€ printed_gst/               # 125 GST-compliant invoices
β”œβ”€β”€ tally_pdf/                 # 125 Tally PDF exports
β”œβ”€β”€ handwritten/               # 125 handwritten photos
└── whatsapp/                  # 125 WhatsApp screenshots

Each invoice includes:

  • 5–20 line items
  • Realistic pricing (β‚Ή10–₹5,000 per item)
  • Correct GST calculations (5%, 12%, 18%)
  • Real supplier names + product abbreviations

Quick Start

Installation

git clone https://github.com/naazimsnh02/kirana-detective.git
cd kirana-detective
pip install -r requirements.txt

Run Fine-tuning on Modal

# Set environment variables
export ROBOFLOW_API_KEY=<your-roboflow-api-key>
export HF_TOKEN=<your-huggingface-token>
modal token new

# Generate synthetic invoices
modal run finetune/generate_invoices.py

# Fine-tune all three models (sequential)
modal run finetune/train_minicpm_v.py           # ~2 hours
modal run finetune/train_minicpm5_1b.py         # ~1 hour
modal run finetune/train_yolo26n.py             # ~2 hours

Models are auto-published to HuggingFace Hub upon completion.

Local Inference

MiniCPM-V (Invoice Extraction):

llama-cli --model minicpm-v-4-6.gguf \
  -p "<|im_start|>system\nExtract invoice data<|im_end|>\n..." \
  --image invoice.png

MiniCPM5-1B (Product Normalization):

from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer"
)

YOLO26n (Object Detection):

from ultralytics import YOLO
model = YOLO("yolo26n_fmcg.onnx")
results = model.predict("shelf.jpg", imgsz=640)

Evaluation & Performance

MiniCPM-V Training Metrics (Actual Run β€” June 10, 2026)

Epoch Train Loss Eval Loss LR
1 6.081 0.2901 8.83e-5
2 3.948 0.2281 4.94e-5
3 3.326 0.212 1.04e-5
  • Training time: 51 min 50 sec (87 steps, 26 s/step on A10G)
  • Avg gradient norm: 178 β†’ 16 (stable convergence)
  • Best checkpoint loaded: epoch 3 (eval loss 0.212)
  • Final avg train loss across all steps: 4.774

Per-invoice-type breakdown (printed GST / Tally / handwritten / WhatsApp) pending a held-out real-invoice test set β€” to be added in Phase 2.

MiniCPM5-1B Evaluation

Metric Value
Exact Match (normalized names) 94.5%
Fuzzy Match (Levenshtein > 0.8) 98.2%
OOV Handling 3.8% fail β†’ manual review flag

YOLO26n Evaluation β€” Final Merged Run (3 datasets, 1,831 classes)

Metric Value
mAP50 (all classes) 0.428
mAP50-95 (all classes) 0.302
Total classes 1,831
Validation images 1,236
Validation instances 13,443
Best epoch 100 (60 initial + 40 resumed)

YOLO26n β€” Pilot Run Per-Class Metrics (single dataset, 10 classes, superseded)

These metrics are from an earlier run on the agentsk47 dataset only. Shown for reference; the production model uses all 3 merged datasets.

Per-class metrics at best epoch (65):

Class Precision Recall mAP50 mAP50-95
Bournvita 0.902 1.000 0.995 0.995
Mysore Sandal Soap 1.000 0.905 0.995 0.944
Nescafe Coffee 0.927 1.000 0.995 0.908
Nivea Body Lotion 0.935 1.000 0.995 0.923
Nivea Soft Cream 0.924 1.000 0.995 0.895
Parachute Coconut Oil 1.000 0.819 0.972 0.928
Patanjali Dant Kanti 1.000 0.985 0.995 0.971
Society Tea 0.878 1.000 0.995 0.845
TresemmΓ© Conditioner 0.814 1.000 0.995 0.995
TresemmΓ© Shampoo 0.968 1.000 0.995 0.922
Macro Average 0.935 0.971 0.993 0.933

Known Limitations & Biases

MiniCPM-V (Invoice Extractor)

Limitation Impact Mitigation
Only 10 FMCG suppliers in training data Fails on uncommon distributors (e.g., local regional suppliers) Collect real invoices from more suppliers post-hackathon
Synthetic data (no image degradation, blur) May struggle with poor-quality photos Add augmentation (blur, noise, shadows) to training data
GST rates hardcoded (5%, 12%, 18%) Misses 0% or 28% GST items Parameterize GST rate extraction
English-only prompts Cannot process invoices in regional languages Add Hindi/Tamil/Marathi templates

MiniCPM5-1B (Product Normalizer)

Limitation Impact Mitigation
Synthetic augmentation only Overfits to rule-based patterns; fails on real-world typos Collect 200+ real invoices for retraining
200 SKU catalog Fails on brands outside top 10 suppliers Expand to 2,000 SKUs (all major Indian FMCG)
No regional abbreviations Tamil/Hindi shortcuts not recognized Add language-specific abbreviation models
No OEM rebrands Misses store-brand relabeling Add rebranding patterns post-research

YOLO26n (Product Detection)

Limitation Impact Mitigation
Merged dataset skewed toward beauty/personal care (TresemmΓ©, Nivea, Patanjali) May underperform on grocery staples (oils, spices, pulses) Balance class distribution; add 40–50 grocery categories
~11K images across 3 datasets May not generalize to unlisted brands or novel shelf layouts Collect 50K+ images via Roboflow community
Confidence threshold (0.25) tuned for this dataset May produce false positives in novel environments Benchmark on held-out kirana store photos
YOLO26n is 8M params (nano) Edge device deployment not yet tested Quantize & benchmark on RPi 4, Android

Fairness & Bias Notes

  • Brand bias: Training data skews toward premium Indian brands (Amul, NestlΓ©, ITC) β€” may underperform on budget/regional brands
  • Supplier bias: Only 10 distributors represented; regional cooperatives not included
  • Language bias: All training prompts in English; non-English invoices will fail
  • Income bias: Kirana store size assumption (β‚Ή5–50 lakh inventory) β€” very large or very small stores may see degraded performance

Reproducibility

Seed Control

All scripts use fixed seeds:

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

Roboflow Dataset Versions (Pinned)

  • agentsk47/indian-grocery-object-detection β€” v1 (May 2025)
  • iit-patna/grocery_items β€” v45 (Apr 2026)
  • project-c5ho0/indian-market β€” v2 (Jun 2025)

Training Infrastructure

  • Orchestration: Modal (serverless GPUs)
  • Fine-tuning Framework: Unsloth 2026.6.1 (LLM), Ultralytics (YOLO)
  • Quantization: llama.cpp (GGUF)
  • Model Publishing: HuggingFace Hub huggingface_hub>=0.30.0

Reproducibility Checklist

  • Dataset versions pinned in code
  • Random seeds fixed
  • Hardware specs documented (A10G, 22 GB VRAM)
  • Training duration recorded (~5 hours total)
  • Evaluation metrics logged post-training
  • Cold start (fresh HF account) validation (TODO: test on new account)

Files in This Repository

kirana-invoice-train-data/
β”œβ”€β”€ README.md                           # This file
β”œβ”€β”€ MODEL_CARD.md                       # Model card for HF Hub
β”œβ”€β”€ requirements.txt                    # Python dependencies
β”‚
β”œβ”€β”€ finetune/
β”‚   β”œβ”€β”€ README.md                       # Training workflow guide
β”‚   β”œβ”€β”€ generate_invoices.py            # Synthetic invoice generator (500 images)
β”‚   β”œβ”€β”€ train_minicpm_v.py              # Fine-tune MiniCPM-V (OCR)
β”‚   β”œβ”€β”€ train_minicpm5_1b.py            # Fine-tune MiniCPM5-1B (normalizer)
β”‚   β”œβ”€β”€ train_yolo26n.py                # Fine-tune YOLO26n (detection)
β”‚   β”œβ”€β”€ export_minicpm_v_gguf.py        # Merge LoRA β†’ push merged HF weights
β”‚   β”œβ”€β”€ push_minicpm_v_to_hf.py         # Push MiniCPM-V LoRA adapter to HF Hub
β”‚   β”œβ”€β”€ push_minicpm_v_merged_card.py   # Update MiniCPM-V merged model card on HF
β”‚   β”œβ”€β”€ push_yolo_to_hf.py              # Push YOLO artifacts from Modal volume to HF
β”‚   └── upload_yolo_to_hf.py            # Upload YOLO artifacts from local disk to HF
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ fmcg_catalog.json               # 200 canonical SKU names + GST rates
β”‚   └── synthetic_invoices/
β”‚       β”œβ”€β”€ annotations.jsonl
β”‚       β”œβ”€β”€ printed_gst/                # 125 invoices
β”‚       β”œβ”€β”€ tally_pdf/                  # 125 invoices
β”‚       β”œβ”€β”€ handwritten/                # 125 invoices
β”‚       └── whatsapp/                   # 125 invoices
β”‚
└── tests/
    └── test_*.py                       # Unit & integration tests

Hardware & Cost Estimates

Training Cost (Modal On-Demand)

Model GPU Duration On-Demand Cost
MiniCPM-V NVIDIA A10G ~52 min (actual) ~$1.30
MiniCPM5-1B NVIDIA A10G ~1 hour $1.50
YOLO26n NVIDIA A10G ~2 hours $3.00
Total β€” ~3h 52min (actual) ~$5.80

Inference Hardware

  • Laptop CPU (Intel i7): ~5–10 sec/invoice (MiniCPM-V) + ~2 sec/normalization + ~3 sec/image (YOLO)
  • GPU (NVIDIA RTX 3080): ~0.5 sec/invoice + ~0.2 sec/normalization + ~0.1 sec/image
  • Edge Device (Raspberry Pi 4): YOLO26n quantized to Q2_K β‰ˆ 30–60 sec/image (untested)

Usage in Production (Kirana Detective App)

Models are downloaded on first run via:

import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

# Merged weights β€” no PEFT required
model = AutoModel.from_pretrained(
    "build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
)

# Inference
image = Image.open("invoice.jpg")
msgs = [{"role": "user", "content": [image, "Extract all line items as JSON."]}]
response = model.chat(image=None, msgs=msgs, tokenizer=tokenizer, sampling=False, max_new_tokens=2048)

Next Steps & Roadmap

Phase 2 (Q3 2026)

  • Collect 500 real invoices from partnered kirana stores
  • Expand product taxonomy: 200 SKUs β†’ 2,000 SKUs
  • Add regional language support (Hindi, Tamil, Marathi, Kannada)
  • Fine-tune on invoice degradation (blur, folds, stains)
  • Benchmark on edge devices (Raspberry Pi, Android)

Phase 3 (Q4 2026)

  • Multi-language MiniCPM5-1B normalizer
  • Expand YOLO26n to 50–100 classes (full grocery taxonomy)
  • Real-time video product counting via YOLO
  • Mobile app (React Native) with offline inference

Research Questions

  • How do models perform on store-private labels vs. branded products?
  • Can we detect counterfeit products via label anomalies?
  • What is the fairness gap for regional vs. national brands?

Licensing & Attribution

  • Code: MIT License
  • Models:
  • Datasets:
    • Roboflow datasets: Individual licenses (CC BY 4.0, CC BY-SA 4.0) β€” check each repo
    • Synthetic invoices: CC0 (public domain)

Contributing

Contributions welcome! Areas of need:

  1. Real invoice collection: Partner kirana stores to share anonymized invoices
  2. Regional language templates: Hindi, Tamil, Marathi invoice formats
  3. Edge device benchmarks: Profile inference on RPi 4, Snapdragon, etc.
  4. Dataset expansion: Add 1,000+ more products to YOLO26n training
  5. Fairness audits: Test models on regional/budget brands

Contact & Support


Citation

If you use this repository or models in your work, please cite:

@misc{kirana_detective_2026,
  author = {Hussain, Syed Naazim},
  title = {Kirana Detective: Fine-Tuned Models for Indian Grocery Invoice Auditing},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged}},
}

Version: 1.0
Last Updated: June 10, 2026