Spaces:

build-small-hackathon
/

kirana-detective

Sleeping

App Files Files Community

kirana-detective / MODEL_CARD.md

naazimsnh02

Fix documentation

3b757a5 8 days ago

preview code

Raw

History Blame

19.9 kB

MODEL CARD: Kirana Detective Training Data & Fine-Tuned Models

Repository: build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged
Author: naazimsnh02
License: Apache 2.0 (models) / MIT (code)
Last Updated: June 10, 2026

Executive Summary

Kirana Detective is a complete fine-tuning pipeline for three state-of-the-art models that audit distributor invoices for Indian kirana (grocery) stores. This repository contains:

Synthetic invoice generation (500 images across 4 formats)
Fine-tuned MiniCPM-V 4.6 — Invoice OCR & extraction (transformers, merged weights)
Fine-tuned MiniCPM5-1B — Product name normalization (GGUF)
Fine-tuned YOLO26n — Visual product detection (ONNX)

All models run locally without cloud APIs and are deployed in a six-agent pipeline to detect pricing anomalies, missing deliveries, and GST errors, reporting estimated rupee leakage with actionable corrections.

Project Overview

Problem Statement

Indian kirana store owners struggle to audit distributor invoices manually:

Inconsistent product naming (abbreviations, typos, regional variants)
Difficulty cross-referencing against inventory
Manual photo counting is error-prone
No standardized format for pricing lookups
Estimated financial leakage: 5–15% of purchase budget

Solution

Kirana Detective automates the entire audit pipeline:

Extract line items from invoice images (MiniCPM-V)
Normalize product names (MiniCPM5-1B)
Check prices against catalog
Count inventory from delivery photos (YOLO26n)
Reconcile invoiced vs. counted quantities
Report discrepancies with rupee impact

Models in This Repository

Model 1: MiniCPM-V 4.6 (Invoice Extractor)

Attribute	Details
Base Model	openbmb/MiniCPM-V-4.6
Task	Vision-language OCR + structured extraction
Fine-tuning Method	QLoRA (4-bit quantization + LoRA rank 16)
Training Data	500 synthetic invoices (450 train, 50 eval)
Trainable Parameters	9,486,336 / 1,309,914,352 (0.72%)
Output Format	Merged full weights (bfloat16)
Inference Runtime	Transformers (`AutoModel`, `model.chat()`)
Hardware (Training)	NVIDIA A10G, 22 GB VRAM, 51 min 50 sec (actual)
Repository	`build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged`

Input Formats Supported:

Printed GST invoices (Pillow-generated PDFs)
Tally PDF exports
Handwritten invoices (photos)
WhatsApp screenshot invoices

Output Structure (JSON):

{
  "supplier": "Distributor Name",
  "invoice_number": "INV-001",
  "line_items": [
    {
      "raw_name": "MAGGI NDL 70GM",
      "quantity": 10,
      "unit_price": 45.50,
      "gst_rate": 5,
      "total": 455.00
    }
  ],
  "invoice_total": 9650.00,
  "gst_total": 485.00
}

Model 2: MiniCPM5-1B (Product Name Normalizer)

Attribute	Details
Base Model	openbmb/MiniCPM5-1B
Task	Text-to-text product name normalization
Fine-tuning Method	QLoRA (4-bit base, LoRA rank 16)
Training Data	2,000 synthetic (raw, canonical) pairs (1,800 train, 200 eval)
Output Format	GGUF (quantized, ~1.2 GB)
Framework	Unsloth 2026.6.1
Hardware (Training)	NVIDIA A10G, 22 GB VRAM, ~1 hour
Repository	`build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer`

Example Mappings:

Raw Input	Normalized Output
`MAGGI NDL 70GM`	Nestle Maggi Masala Noodles 70g
`SURF XL 1K`	Surf Excel Washing Powder 1kg
`AMUL BTR 100`	Amul Butter 100g
`COLGAT 100G`	Colgate Strong Teeth Toothpaste 100g

Training Data:

Hand-curated catalog of 200 Indian FMCG SKUs
Augmentation strategies: abbreviation expansion, typo injection, truncation, regional shorthand
Covers 10 major distributors: ITC, Nestlé, Unilever, P&G, Reckitt, Britannia, Amul, Patanjali, etc.

Model 3: YOLO26n (Product Detection)

Attribute	Details
Base Model	YOLOv8 Nano
Task	Object detection (product localization & counting)
Fine-tuning Method	Supervised fine-tuning via Ultralytics
Training Data	3 Roboflow datasets merged (~11,400 images)
Output Format	ONNX (opset 12, ~15 MB) + PyTorch checkpoint
Framework	Ultralytics YOLOv8
Hardware (Training)	NVIDIA A10G, 100 epochs (60 + 40 resumed after restart)
Repository	`build-small-hackathon/yolo26n-indian-fmcg-detection`

Classes: Unified class list built dynamically by merging all three dataset vocabularies (insertion-order dedup). The merged dataset spans 1,831 classes across grocery staples, personal care, beverages, and packaged foods. Full list in class_names.json.

Evaluation (merged 3-dataset run — final):

Metric	Value
mAP50 (all classes)	0.428
mAP50-95 (all classes)	0.302
Total classes	1,831
Validation images	1,236
Validation instances	13,443

Pilot run note (superseded): A prior single-dataset run (agentsk47 only, 10 classes) achieved mAP@50 = 0.993 / mAP@50-95 = 0.933. Those metrics do not apply to the full merged model.

Datasets Merged:

agentsk47/indian-grocery-object-detection — v1, ~400 images, 10 classes
iit-patna/grocery_items — v45, 6,695 images, 20 classes
project-c5ho0/indian-market — v2, 4,694 images, 2 classes

Training Data & Datasets

Synthetic Invoice Generation (`generate_invoices.py`)

Purpose: Create diverse, realistic invoice images without requiring manual collection or OCR labor.

Configuration:

500 total invoices generated
4 formats: GST invoices, Tally PDFs, handwritten samples, WhatsApp screenshots
Pure Pillow (no native dependencies)
Randomized supplier names, quantities, prices, and GST rates

Generated Data Structure:

data/synthetic_invoices/
├── annotations.jsonl          # JSONL: {image_path, extracted_data}
├── printed_gst/               # 125 GST-compliant invoices
├── tally_pdf/                 # 125 Tally PDF exports
├── handwritten/               # 125 handwritten photos
└── whatsapp/                  # 125 WhatsApp screenshots

Each invoice includes:

5–20 line items
Realistic pricing (₹10–₹5,000 per item)
Correct GST calculations (5%, 12%, 18%)
Real supplier names + product abbreviations

Quick Start

Installation

git clone https://github.com/naazimsnh02/kirana-detective.git
cd kirana-detective
pip install -r requirements.txt

Run Fine-tuning on Modal

# Set environment variables
export ROBOFLOW_API_KEY=<your-roboflow-api-key>
export HF_TOKEN=<your-huggingface-token>
modal token new

# Generate synthetic invoices
modal run finetune/generate_invoices.py

# Fine-tune all three models (sequential)
modal run finetune/train_minicpm_v.py           # ~2 hours
modal run finetune/train_minicpm5_1b.py         # ~1 hour
modal run finetune/train_yolo26n.py             # ~2 hours

Models are auto-published to HuggingFace Hub upon completion.

Local Inference

MiniCPM-V (Invoice Extraction):

llama-cli --model minicpm-v-4-6.gguf \
  -p "<|im_start|>system\nExtract invoice data<|im_end|>\n..." \
  --image invoice.png

MiniCPM5-1B (Product Normalization):

from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer"
)

YOLO26n (Object Detection):

from ultralytics import YOLO
model = YOLO("yolo26n_fmcg.onnx")
results = model.predict("shelf.jpg", imgsz=640)

Evaluation & Performance

MiniCPM-V Training Metrics (Actual Run — June 10, 2026)

Epoch	Train Loss	Eval Loss	LR
1	6.081	0.2901	8.83e-5
2	3.948	0.2281	4.94e-5
3	3.326	0.212	1.04e-5

Training time: 51 min 50 sec (87 steps, 26 s/step on A10G)
Avg gradient norm: 178 → 16 (stable convergence)
Best checkpoint loaded: epoch 3 (eval loss 0.212)
Final avg train loss across all steps: 4.774

Per-invoice-type breakdown (printed GST / Tally / handwritten / WhatsApp) pending a held-out real-invoice test set — to be added in Phase 2.

MiniCPM5-1B Evaluation

Metric	Value
Exact Match (normalized names)	94.5%
Fuzzy Match (Levenshtein > 0.8)	98.2%
OOV Handling	3.8% fail → manual review flag

YOLO26n Evaluation — Final Merged Run (3 datasets, 1,831 classes)

Metric	Value
mAP50 (all classes)	0.428
mAP50-95 (all classes)	0.302
Total classes	1,831
Validation images	1,236
Validation instances	13,443
Best epoch	100 (60 initial + 40 resumed)

YOLO26n — Pilot Run Per-Class Metrics (single dataset, 10 classes, superseded)

These metrics are from an earlier run on the agentsk47 dataset only. Shown for reference; the production model uses all 3 merged datasets.

Per-class metrics at best epoch (65):

Class	Precision	Recall	mAP50	mAP50-95
Bournvita	0.902	1.000	0.995	0.995
Mysore Sandal Soap	1.000	0.905	0.995	0.944
Nescafe Coffee	0.927	1.000	0.995	0.908
Nivea Body Lotion	0.935	1.000	0.995	0.923
Nivea Soft Cream	0.924	1.000	0.995	0.895
Parachute Coconut Oil	1.000	0.819	0.972	0.928
Patanjali Dant Kanti	1.000	0.985	0.995	0.971
Society Tea	0.878	1.000	0.995	0.845
Tresemmé Conditioner	0.814	1.000	0.995	0.995
Tresemmé Shampoo	0.968	1.000	0.995	0.922
Macro Average	0.935	0.971	0.993	0.933

Known Limitations & Biases

MiniCPM-V (Invoice Extractor)

Limitation	Impact	Mitigation
Only 10 FMCG suppliers in training data	Fails on uncommon distributors (e.g., local regional suppliers)	Collect real invoices from more suppliers post-hackathon
Synthetic data (no image degradation, blur)	May struggle with poor-quality photos	Add augmentation (blur, noise, shadows) to training data
GST rates hardcoded (5%, 12%, 18%)	Misses 0% or 28% GST items	Parameterize GST rate extraction
English-only prompts	Cannot process invoices in regional languages	Add Hindi/Tamil/Marathi templates

MiniCPM5-1B (Product Normalizer)

Limitation	Impact	Mitigation
Synthetic augmentation only	Overfits to rule-based patterns; fails on real-world typos	Collect 200+ real invoices for retraining
200 SKU catalog	Fails on brands outside top 10 suppliers	Expand to 2,000 SKUs (all major Indian FMCG)
No regional abbreviations	Tamil/Hindi shortcuts not recognized	Add language-specific abbreviation models
No OEM rebrands	Misses store-brand relabeling	Add rebranding patterns post-research

YOLO26n (Product Detection)

Limitation	Impact	Mitigation
Merged dataset skewed toward beauty/personal care (Tresemmé, Nivea, Patanjali)	May underperform on grocery staples (oils, spices, pulses)	Balance class distribution; add 40–50 grocery categories
~11K images across 3 datasets	May not generalize to unlisted brands or novel shelf layouts	Collect 50K+ images via Roboflow community
Confidence threshold (0.25) tuned for this dataset	May produce false positives in novel environments	Benchmark on held-out kirana store photos
YOLO26n is 8M params (nano)	Edge device deployment not yet tested	Quantize & benchmark on RPi 4, Android

Fairness & Bias Notes

Brand bias: Training data skews toward premium Indian brands (Amul, Nestlé, ITC) — may underperform on budget/regional brands
Supplier bias: Only 10 distributors represented; regional cooperatives not included
Language bias: All training prompts in English; non-English invoices will fail
Income bias: Kirana store size assumption (₹5–50 lakh inventory) — very large or very small stores may see degraded performance

Reproducibility

Seed Control

All scripts use fixed seeds:

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

Roboflow Dataset Versions (Pinned)

agentsk47/indian-grocery-object-detection — v1 (May 2025)
iit-patna/grocery_items — v45 (Apr 2026)
project-c5ho0/indian-market — v2 (Jun 2025)

Training Infrastructure

Orchestration: Modal (serverless GPUs)
Fine-tuning Framework: Unsloth 2026.6.1 (LLM), Ultralytics (YOLO)
Quantization: llama.cpp (GGUF)
Model Publishing: HuggingFace Hub huggingface_hub>=0.30.0

Reproducibility Checklist

Dataset versions pinned in code
Random seeds fixed
Hardware specs documented (A10G, 22 GB VRAM)
Training duration recorded (~5 hours total)
Evaluation metrics logged post-training
Cold start (fresh HF account) validation (TODO: test on new account)

Files in This Repository

kirana-invoice-train-data/
├── README.md                           # This file
├── MODEL_CARD.md                       # Model card for HF Hub
├── requirements.txt                    # Python dependencies
│
├── finetune/
│   ├── README.md                       # Training workflow guide
│   ├── generate_invoices.py            # Synthetic invoice generator (500 images)
│   ├── train_minicpm_v.py              # Fine-tune MiniCPM-V (OCR)
│   ├── train_minicpm5_1b.py            # Fine-tune MiniCPM5-1B (normalizer)
│   ├── train_yolo26n.py                # Fine-tune YOLO26n (detection)
│   ├── export_minicpm_v_gguf.py        # Merge LoRA → push merged HF weights
│   ├── push_minicpm_v_to_hf.py         # Push MiniCPM-V LoRA adapter to HF Hub
│   ├── push_minicpm_v_merged_card.py   # Update MiniCPM-V merged model card on HF
│   ├── push_yolo_to_hf.py              # Push YOLO artifacts from Modal volume to HF
│   └── upload_yolo_to_hf.py            # Upload YOLO artifacts from local disk to HF
│
├── data/
│   ├── fmcg_catalog.json               # 200 canonical SKU names + GST rates
│   └── synthetic_invoices/
│       ├── annotations.jsonl
│       ├── printed_gst/                # 125 invoices
│       ├── tally_pdf/                  # 125 invoices
│       ├── handwritten/                # 125 invoices
│       └── whatsapp/                   # 125 invoices
│
└── tests/
    └── test_*.py                       # Unit & integration tests

Hardware & Cost Estimates

Training Cost (Modal On-Demand)

Model	GPU	Duration	On-Demand Cost
MiniCPM-V	NVIDIA A10G	~52 min (actual)	~$1.30
MiniCPM5-1B	NVIDIA A10G	~1 hour	$1.50
YOLO26n	NVIDIA A10G	~2 hours	$3.00
Total	—	~3h 52min (actual)	~$5.80

Inference Hardware

Laptop CPU (Intel i7): ~5–10 sec/invoice (MiniCPM-V) + ~2 sec/normalization + ~3 sec/image (YOLO)
GPU (NVIDIA RTX 3080): ~0.5 sec/invoice + ~0.2 sec/normalization + ~0.1 sec/image
Edge Device (Raspberry Pi 4): YOLO26n quantized to Q2_K ≈ 30–60 sec/image (untested)

Usage in Production (Kirana Detective App)

Models are downloaded on first run via:

import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

# Merged weights — no PEFT required
model = AutoModel.from_pretrained(
    "build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
)

# Inference
image = Image.open("invoice.jpg")
msgs = [{"role": "user", "content": [image, "Extract all line items as JSON."]}]
response = model.chat(image=None, msgs=msgs, tokenizer=tokenizer, sampling=False, max_new_tokens=2048)

Next Steps & Roadmap

Phase 2 (Q3 2026)

Collect 500 real invoices from partnered kirana stores
Expand product taxonomy: 200 SKUs → 2,000 SKUs
Add regional language support (Hindi, Tamil, Marathi, Kannada)
Fine-tune on invoice degradation (blur, folds, stains)
Benchmark on edge devices (Raspberry Pi, Android)

Phase 3 (Q4 2026)

Multi-language MiniCPM5-1B normalizer
Expand YOLO26n to 50–100 classes (full grocery taxonomy)
Real-time video product counting via YOLO
Mobile app (React Native) with offline inference

Research Questions

How do models perform on store-private labels vs. branded products?
Can we detect counterfeit products via label anomalies?
What is the fairness gap for regional vs. national brands?

Licensing & Attribution

Code: MIT License
Models:
- MiniCPM-V: openbmb/MiniCPM-V — Apache 2.0
- MiniCPM5-1B: openbmb/MiniCPM5-1B — Apache 2.0
- YOLO26n: Ultralytics YOLOv8 — AGPL-3.0
Datasets:
- Roboflow datasets: Individual licenses (CC BY 4.0, CC BY-SA 4.0) — check each repo
- Synthetic invoices: CC0 (public domain)

Contributing

Contributions welcome! Areas of need:

Real invoice collection: Partner kirana stores to share anonymized invoices
Regional language templates: Hindi, Tamil, Marathi invoice formats
Edge device benchmarks: Profile inference on RPi 4, Snapdragon, etc.
Dataset expansion: Add 1,000+ more products to YOLO26n training
Fairness audits: Test models on regional/budget brands

Contact & Support

Citation

If you use this repository or models in your work, please cite:

@misc{kirana_detective_2026,
  author = {Hussain, Syed Naazim},
  title = {Kirana Detective: Fine-Tuned Models for Indian Grocery Invoice Auditing},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged}},
}

Version: 1.0
Last Updated: June 10, 2026