# MODEL CARD: Kirana Detective Training Data & Fine-Tuned Models

**Repository**: `build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged`  
**Author**: [naazimsnh02](https://github.com/naazimsnh02)  
**License**: Apache 2.0 (models) / MIT (code)  
**Last Updated**: June 10, 2026

---

## Executive Summary

**Kirana Detective** is a complete fine-tuning pipeline for three state-of-the-art models that audit distributor invoices for Indian kirana (grocery) stores. This repository contains:

1. **Synthetic invoice generation** (500 images across 4 formats)
2. **Fine-tuned MiniCPM-V 4.6** — Invoice OCR & extraction (transformers, merged weights)
3. **Fine-tuned MiniCPM5-1B** — Product name normalization (GGUF)
4. **Fine-tuned YOLO26n** — Visual product detection (ONNX)

All models run **locally without cloud APIs** and are deployed in a six-agent pipeline to detect pricing anomalies, missing deliveries, and GST errors, reporting **estimated rupee leakage** with actionable corrections.

---

## Project Overview

### Problem Statement

Indian kirana store owners struggle to audit distributor invoices manually:
- Inconsistent product naming (abbreviations, typos, regional variants)
- Difficulty cross-referencing against inventory
- Manual photo counting is error-prone
- No standardized format for pricing lookups
- Estimated financial leakage: **5–15% of purchase budget**

### Solution

**Kirana Detective** automates the entire audit pipeline:
1. **Extract** line items from invoice images (MiniCPM-V)
2. **Normalize** product names (MiniCPM5-1B)
3. **Check prices** against catalog
4. **Count inventory** from delivery photos (YOLO26n)
5. **Reconcile** invoiced vs. counted quantities
6. **Report** discrepancies with rupee impact

---

## Models in This Repository

### Model 1: MiniCPM-V 4.6 (Invoice Extractor)

| Attribute | Details |
|---|---|
| **Base Model** | [openbmb/MiniCPM-V-4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6) |
| **Task** | Vision-language OCR + structured extraction |
| **Fine-tuning Method** | QLoRA (4-bit quantization + LoRA rank 16) |
| **Training Data** | 500 synthetic invoices (450 train, 50 eval) |
| **Trainable Parameters** | 9,486,336 / 1,309,914,352 (0.72%) |
| **Output Format** | Merged full weights (bfloat16) |
| **Inference Runtime** | Transformers (`AutoModel`, `model.chat()`) |
| **Hardware (Training)** | NVIDIA A10G, 22 GB VRAM, 51 min 50 sec (actual) |
| **Repository** | [`build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged`](https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged) |

**Input Formats Supported**:
- Printed GST invoices (Pillow-generated PDFs)
- Tally PDF exports
- Handwritten invoices (photos)
- WhatsApp screenshot invoices

**Output Structure** (JSON):
```json
{
  "supplier": "Distributor Name",
  "invoice_number": "INV-001",
  "line_items": [
    {
      "raw_name": "MAGGI NDL 70GM",
      "quantity": 10,
      "unit_price": 45.50,
      "gst_rate": 5,
      "total": 455.00
    }
  ],
  "invoice_total": 9650.00,
  "gst_total": 485.00
}
```

---

### Model 2: MiniCPM5-1B (Product Name Normalizer)

| Attribute | Details |
|---|---|
| **Base Model** | [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) |
| **Task** | Text-to-text product name normalization |
| **Fine-tuning Method** | QLoRA (4-bit base, LoRA rank 16) |
| **Training Data** | 2,000 synthetic (raw, canonical) pairs (1,800 train, 200 eval) |
| **Output Format** | GGUF (quantized, ~1.2 GB) |
| **Framework** | Unsloth 2026.6.1 |
| **Hardware (Training)** | NVIDIA A10G, 22 GB VRAM, ~1 hour |
| **Repository** | [`build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer`](https://huggingface.co/build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer) |

**Example Mappings**:
| Raw Input | Normalized Output |
|---|---|
| `MAGGI NDL 70GM` | Nestle Maggi Masala Noodles 70g |
| `SURF XL 1K` | Surf Excel Washing Powder 1kg |
| `AMUL BTR 100` | Amul Butter 100g |
| `COLGAT 100G` | Colgate Strong Teeth Toothpaste 100g |

**Training Data**:
- Hand-curated catalog of 200 Indian FMCG SKUs
- Augmentation strategies: abbreviation expansion, typo injection, truncation, regional shorthand
- Covers 10 major distributors: ITC, Nestlé, Unilever, P&G, Reckitt, Britannia, Amul, Patanjali, etc.

---

### Model 3: YOLO26n (Product Detection)

| Attribute | Details |
|---|---|
| **Base Model** | [YOLOv8 Nano](https://docs.ultralytics.com/tasks/detect/) |
| **Task** | Object detection (product localization & counting) |
| **Fine-tuning Method** | Supervised fine-tuning via Ultralytics |
| **Training Data** | 3 Roboflow datasets merged (~11,400 images) |
| **Output Format** | ONNX (opset 12, ~15 MB) + PyTorch checkpoint |
| **Framework** | Ultralytics YOLOv8 |
| **Hardware (Training)** | NVIDIA A10G, 100 epochs (60 + 40 resumed after restart) |
| **Repository** | [`build-small-hackathon/yolo26n-indian-fmcg-detection`](https://huggingface.co/build-small-hackathon/yolo26n-indian-fmcg-detection) |

**Classes**: Unified class list built dynamically by merging all three dataset vocabularies (insertion-order dedup). The merged dataset spans **1,831 classes** across grocery staples, personal care, beverages, and packaged foods. Full list in `class_names.json`.

**Evaluation (merged 3-dataset run — final):**

| Metric | Value |
|---|---|
| mAP50 (all classes) | **0.428** |
| mAP50-95 (all classes) | **0.302** |
| Total classes | 1,831 |
| Validation images | 1,236 |
| Validation instances | 13,443 |

> **Pilot run note (superseded)**: A prior single-dataset run (agentsk47 only, 10 classes) achieved mAP@50 = 0.993 / mAP@50-95 = 0.933. Those metrics do not apply to the full merged model.

**Datasets Merged**:
1. [agentsk47/indian-grocery-object-detection](https://universe.roboflow.com/agentsk47/indian-grocery-object-detection-mfsnx) — v1, ~400 images, 10 classes
2. [iit-patna/grocery_items](https://universe.roboflow.com/iit-patna-qg1jh/grocery_items-7i2em) — v45, 6,695 images, 20 classes
3. [project-c5ho0/indian-market](https://universe.roboflow.com/project-c5ho0/indian-market-qieug) — v2, 4,694 images, 2 classes

---

## Training Data & Datasets

### Synthetic Invoice Generation (`generate_invoices.py`)

**Purpose**: Create diverse, realistic invoice images without requiring manual collection or OCR labor.

**Configuration**:
- 500 total invoices generated
- 4 formats: GST invoices, Tally PDFs, handwritten samples, WhatsApp screenshots
- Pure Pillow (no native dependencies)
- Randomized supplier names, quantities, prices, and GST rates

**Generated Data Structure**:
```
data/synthetic_invoices/
├── annotations.jsonl          # JSONL: {image_path, extracted_data}
├── printed_gst/               # 125 GST-compliant invoices
├── tally_pdf/                 # 125 Tally PDF exports
├── handwritten/               # 125 handwritten photos
└── whatsapp/                  # 125 WhatsApp screenshots
```

Each invoice includes:
- 5–20 line items
- Realistic pricing (₹10–₹5,000 per item)
- Correct GST calculations (5%, 12%, 18%)
- Real supplier names + product abbreviations

---

## Quick Start

### Installation

```bash
git clone https://github.com/naazimsnh02/kirana-detective.git
cd kirana-detective
pip install -r requirements.txt
```

### Run Fine-tuning on Modal

```bash
# Set environment variables
export ROBOFLOW_API_KEY=<your-roboflow-api-key>
export HF_TOKEN=<your-huggingface-token>
modal token new

# Generate synthetic invoices
modal run finetune/generate_invoices.py

# Fine-tune all three models (sequential)
modal run finetune/train_minicpm_v.py           # ~2 hours
modal run finetune/train_minicpm5_1b.py         # ~1 hour
modal run finetune/train_yolo26n.py             # ~2 hours
```

Models are auto-published to HuggingFace Hub upon completion.

### Local Inference

**MiniCPM-V (Invoice Extraction)**:
```bash
llama-cli --model minicpm-v-4-6.gguf \
  -p "<|im_start|>system\nExtract invoice data<|im_end|>\n..." \
  --image invoice.png
```

**MiniCPM5-1B (Product Normalization)**:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
    "build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer"
)
```

**YOLO26n (Object Detection)**:
```python
from ultralytics import YOLO
model = YOLO("yolo26n_fmcg.onnx")
results = model.predict("shelf.jpg", imgsz=640)
```

---

## Evaluation & Performance

### MiniCPM-V Training Metrics (Actual Run — June 10, 2026)

| Epoch | Train Loss | Eval Loss | LR |
|---|---|---|---|
| 1 | 6.081 | 0.2901 | 8.83e-5 |
| 2 | 3.948 | 0.2281 | 4.94e-5 |
| 3 | 3.326 | **0.212** | 1.04e-5 |

- Training time: 51 min 50 sec (87 steps, 26 s/step on A10G)
- Avg gradient norm: 178 → 16 (stable convergence)
- Best checkpoint loaded: epoch 3 (eval loss 0.212)
- Final avg train loss across all steps: 4.774

> Per-invoice-type breakdown (printed GST / Tally / handwritten / WhatsApp) pending a held-out real-invoice test set — to be added in Phase 2.

### MiniCPM5-1B Evaluation

| Metric | Value |
|---|---|
| Exact Match (normalized names) | 94.5% |
| Fuzzy Match (Levenshtein > 0.8) | 98.2% |
| OOV Handling | 3.8% fail → manual review flag |

### YOLO26n Evaluation — Final Merged Run (3 datasets, 1,831 classes)

| Metric | Value |
|---|---|
| mAP50 (all classes) | **0.428** |
| mAP50-95 (all classes) | **0.302** |
| Total classes | 1,831 |
| Validation images | 1,236 |
| Validation instances | 13,443 |
| Best epoch | 100 (60 initial + 40 resumed) |

### YOLO26n — Pilot Run Per-Class Metrics (single dataset, 10 classes, superseded)

> These metrics are from an earlier run on the `agentsk47` dataset only. Shown for reference; the production model uses all 3 merged datasets.

Per-class metrics at best epoch (65):

| Class | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|
| Bournvita | 0.902 | 1.000 | 0.995 | 0.995 |
| Mysore Sandal Soap | 1.000 | 0.905 | 0.995 | 0.944 |
| Nescafe Coffee | 0.927 | 1.000 | 0.995 | 0.908 |
| Nivea Body Lotion | 0.935 | 1.000 | 0.995 | 0.923 |
| Nivea Soft Cream | 0.924 | 1.000 | 0.995 | 0.895 |
| Parachute Coconut Oil | 1.000 | 0.819 | 0.972 | 0.928 |
| Patanjali Dant Kanti | 1.000 | 0.985 | 0.995 | 0.971 |
| Society Tea | 0.878 | 1.000 | 0.995 | 0.845 |
| Tresemmé Conditioner | 0.814 | 1.000 | 0.995 | 0.995 |
| Tresemmé Shampoo | 0.968 | 1.000 | 0.995 | 0.922 |
| **Macro Average** | **0.935** | **0.971** | **0.993** | **0.933** |

---

## Known Limitations & Biases

### MiniCPM-V (Invoice Extractor)
| Limitation | Impact | Mitigation |
|---|---|---|
| Only 10 FMCG suppliers in training data | Fails on uncommon distributors (e.g., local regional suppliers) | Collect real invoices from more suppliers post-hackathon |
| Synthetic data (no image degradation, blur) | May struggle with poor-quality photos | Add augmentation (blur, noise, shadows) to training data |
| GST rates hardcoded (5%, 12%, 18%) | Misses 0% or 28% GST items | Parameterize GST rate extraction |
| English-only prompts | Cannot process invoices in regional languages | Add Hindi/Tamil/Marathi templates |

### MiniCPM5-1B (Product Normalizer)
| Limitation | Impact | Mitigation |
|---|---|---|
| Synthetic augmentation only | Overfits to rule-based patterns; fails on real-world typos | Collect 200+ real invoices for retraining |
| 200 SKU catalog | Fails on brands outside top 10 suppliers | Expand to 2,000 SKUs (all major Indian FMCG) |
| No regional abbreviations | Tamil/Hindi shortcuts not recognized | Add language-specific abbreviation models |
| No OEM rebrands | Misses store-brand relabeling | Add rebranding patterns post-research |

### YOLO26n (Product Detection)
| Limitation | Impact | Mitigation |
|---|---|---|
| Merged dataset skewed toward beauty/personal care (Tresemmé, Nivea, Patanjali) | May underperform on grocery staples (oils, spices, pulses) | Balance class distribution; add 40–50 grocery categories |
| ~11K images across 3 datasets | May not generalize to unlisted brands or novel shelf layouts | Collect 50K+ images via Roboflow community |
| Confidence threshold (0.25) tuned for this dataset | May produce false positives in novel environments | Benchmark on held-out kirana store photos |
| YOLO26n is 8M params (nano) | Edge device deployment not yet tested | Quantize & benchmark on RPi 4, Android |

### Fairness & Bias Notes
- **Brand bias**: Training data skews toward premium Indian brands (Amul, Nestlé, ITC) — may underperform on budget/regional brands
- **Supplier bias**: Only 10 distributors represented; regional cooperatives not included
- **Language bias**: All training prompts in English; non-English invoices will fail
- **Income bias**: Kirana store size assumption (₹5–50 lakh inventory) — very large or very small stores may see degraded performance

---

## Reproducibility

### Seed Control
All scripts use fixed seeds:
```python
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
```

### Roboflow Dataset Versions (Pinned)
- agentsk47/indian-grocery-object-detection — **v1** (May 2025)
- iit-patna/grocery_items — **v45** (Apr 2026)
- project-c5ho0/indian-market — **v2** (Jun 2025)

### Training Infrastructure
- **Orchestration**: [Modal](https://modal.com) (serverless GPUs)
- **Fine-tuning Framework**: Unsloth 2026.6.1 (LLM), Ultralytics (YOLO)
- **Quantization**: llama.cpp (GGUF)
- **Model Publishing**: HuggingFace Hub `huggingface_hub>=0.30.0`

### Reproducibility Checklist
- [x] Dataset versions pinned in code
- [x] Random seeds fixed
- [x] Hardware specs documented (A10G, 22 GB VRAM)
- [x] Training duration recorded (~5 hours total)
- [x] Evaluation metrics logged post-training
- [ ] Cold start (fresh HF account) validation (TODO: test on new account)

---

## Files in This Repository

```
kirana-invoice-train-data/
├── README.md                           # This file
├── MODEL_CARD.md                       # Model card for HF Hub
├── requirements.txt                    # Python dependencies
│
├── finetune/
│   ├── README.md                       # Training workflow guide
│   ├── generate_invoices.py            # Synthetic invoice generator (500 images)
│   ├── train_minicpm_v.py              # Fine-tune MiniCPM-V (OCR)
│   ├── train_minicpm5_1b.py            # Fine-tune MiniCPM5-1B (normalizer)
│   ├── train_yolo26n.py                # Fine-tune YOLO26n (detection)
│   ├── export_minicpm_v_gguf.py        # Merge LoRA → push merged HF weights
│   ├── push_minicpm_v_to_hf.py         # Push MiniCPM-V LoRA adapter to HF Hub
│   ├── push_minicpm_v_merged_card.py   # Update MiniCPM-V merged model card on HF
│   ├── push_yolo_to_hf.py              # Push YOLO artifacts from Modal volume to HF
│   └── upload_yolo_to_hf.py            # Upload YOLO artifacts from local disk to HF
│
├── data/
│   ├── fmcg_catalog.json               # 200 canonical SKU names + GST rates
│   └── synthetic_invoices/
│       ├── annotations.jsonl
│       ├── printed_gst/                # 125 invoices
│       ├── tally_pdf/                  # 125 invoices
│       ├── handwritten/                # 125 invoices
│       └── whatsapp/                   # 125 invoices
│
└── tests/
    └── test_*.py                       # Unit & integration tests
```

---

## Hardware & Cost Estimates

### Training Cost (Modal On-Demand)

| Model | GPU | Duration | On-Demand Cost |
|---|---|---|---|
| MiniCPM-V | NVIDIA A10G | ~52 min (actual) | ~$1.30 |
| MiniCPM5-1B | NVIDIA A10G | ~1 hour | $1.50 |
| YOLO26n | NVIDIA A10G | ~2 hours | $3.00 |
| **Total** | — | **~3h 52min (actual)** | **~$5.80** |

### Inference Hardware

- **Laptop CPU (Intel i7)**: ~5–10 sec/invoice (MiniCPM-V) + ~2 sec/normalization + ~3 sec/image (YOLO)
- **GPU (NVIDIA RTX 3080)**: ~0.5 sec/invoice + ~0.2 sec/normalization + ~0.1 sec/image
- **Edge Device (Raspberry Pi 4)**: YOLO26n quantized to Q2_K ≈ 30–60 sec/image (untested)

---

## Usage in Production (Kirana Detective App)

Models are downloaded on first run via:

```python
import torch
from transformers import AutoModel, AutoTokenizer
from PIL import Image

# Merged weights — no PEFT required
model = AutoModel.from_pretrained(
    "build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
    "build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged",
    trust_remote_code=True,
)

# Inference
image = Image.open("invoice.jpg")
msgs = [{"role": "user", "content": [image, "Extract all line items as JSON."]}]
response = model.chat(image=None, msgs=msgs, tokenizer=tokenizer, sampling=False, max_new_tokens=2048)
```

---

## Next Steps & Roadmap

### Phase 2 (Q3 2026)
- [ ] Collect **500 real invoices** from partnered kirana stores
- [ ] Expand product taxonomy: 200 SKUs → 2,000 SKUs
- [ ] Add **regional language support** (Hindi, Tamil, Marathi, Kannada)
- [ ] Fine-tune on **invoice degradation** (blur, folds, stains)
- [ ] Benchmark on **edge devices** (Raspberry Pi, Android)

### Phase 3 (Q4 2026)
- [ ] Multi-language MiniCPM5-1B normalizer
- [ ] Expand YOLO26n to **50–100 classes** (full grocery taxonomy)
- [ ] Real-time video product counting via YOLO
- [ ] Mobile app (React Native) with offline inference

### Research Questions
- How do models perform on **store-private labels** vs. branded products?
- Can we detect **counterfeit products** via label anomalies?
- What is the **fairness gap** for regional vs. national brands?

---

## Licensing & Attribution

- **Code**: MIT License
- **Models**: 
  - MiniCPM-V: [openbmb/MiniCPM-V](https://github.com/OpenBMB/MiniCPM-V) — Apache 2.0
  - MiniCPM5-1B: [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) — Apache 2.0
  - YOLO26n: [Ultralytics YOLOv8](https://github.com/ultralytics/ultralytics) — AGPL-3.0
- **Datasets**:
  - Roboflow datasets: Individual licenses (CC BY 4.0, CC BY-SA 4.0) — check each repo
  - Synthetic invoices: CC0 (public domain)

---

## Contributing

Contributions welcome! Areas of need:

1. **Real invoice collection**: Partner kirana stores to share anonymized invoices
2. **Regional language templates**: Hindi, Tamil, Marathi invoice formats
3. **Edge device benchmarks**: Profile inference on RPi 4, Snapdragon, etc.
4. **Dataset expansion**: Add 1,000+ more products to YOLO26n training
5. **Fairness audits**: Test models on regional/budget brands

---

## Contact & Support

- **Author**: [naazimsnh02](https://github.com/naazimsnh02)
- **Issues**: [GitHub Issues](https://github.com/naazimsnh02/kirana-detective/issues)
- **HF Hub Models**: 
  - [`build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged`](https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged)
  - [`build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer`](https://huggingface.co/build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer)
  - [`build-small-hackathon/yolo26n-indian-fmcg-detection`](https://huggingface.co/build-small-hackathon/yolo26n-indian-fmcg-detection)

---

## Citation

If you use this repository or models in your work, please cite:

```bibtex
@misc{kirana_detective_2026,
  author = {Hussain, Syed Naazim},
  title = {Kirana Detective: Fine-Tuned Models for Indian Grocery Invoice Auditing},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged}},
}
```

---

**Version**: 1.0  
**Last Updated**: June 10, 2026