Spaces:

build-small-hackathon
/

kirana-detective

Sleeping

App Files Files Community

kirana-detective / README.md

naazimsnh02

Fix: shorten short_description to meet HF 60-char limit

6dbaf70 8 days ago

preview code

Raw

History Blame

13.9 kB

	---
	sdk: gradio
	sdk_version: 6.16.0
	app_file: app.py
	title: Kirana Detective AI
	short_description: AI invoice auditor for Indian kirana stores
	license: mit
	tags:
	- invoice-audit
	- llm
	- yolo
	- gguf
	- gradio
	- indian-fmcg
	- kirana
	- minicpm
	- multimodal
	- backyard-ai
	- local-first
	- fine-tuned
	- custom-ui
	- llama.cpp
	- open-trace
	- blog-post
	- openbmb
	- modal.com
	---

	<div align="center">

	# 🔍 Kirana Detective AI

	### AI-Powered Invoice & Inventory Auditor for Indian Kirana Stores

	Find where money is being lost — in under 60 seconds

	[![HF Space](https://img.shields.io/badge/🤗%20Space-Kirana%20Detective-blue)](https://huggingface.co/spaces/build-small-hackathon/kirana-detective)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
	[![Python 3.11](https://img.shields.io/badge/Python-3.11-blue.svg)](https://python.org)
	[![Training: Modal](https://img.shields.io/badge/Training-Modal%20A10G-orange)](https://modal.com)
	[![Models: OpenBMB](https://img.shields.io/badge/Models-OpenBMB%20MiniCPM-purple)](https://huggingface.co/openbmb)
	[![Hackathon: Build Small 2026](https://img.shields.io/badge/Hackathon-HF%20Build%20Small%202026-yellow)](https://huggingface.co/build-small-hackathon)

	</div>

	---

	## What It Does

	Indian kirana store owners receive 3–5 distributor invoices every week via WhatsApp, printed bills, or Tally exports. Verifying them manually is impossible.

	Kirana Detective uploads an invoice + delivery photos and finds:

	\| Finding \| Example \|
	\|---\|---\|
	\| Price overcharge \| Surf Excel 1kg charged ₹255 — historical price ₹220 (+15.9%) \|
	\| Delivery shortage \| Invoice says 24 Coke bottles — photo shows 20 \|
	\| Duplicate charge \| Parle-G 80g appears twice on the same invoice \|
	\| GST mismatch \| Aashirvaad Atta billed at 12% instead of 5% \|

	Every finding converts to a rupee leakage number with an actionable follow-up step.

	---

	## Demo

	```
	Upload Invoice (photo/PDF/WhatsApp) + Delivery Photos (up to 5)
	↓
	Agent 1 — MiniCPM-V 4.6 extracts structured JSON from invoice image
	Agent 2 — MiniCPM5-1B normalises "SURF XL 1K" → "Surf Excel Washing Powder 1kg"
	Agent 3 — Rule engine checks price vs. stored invoice history
	Agent 4 — YOLO26n counts products in delivery photos
	Agent 5 — Reconciliation: invoice qty vs. counted qty → shortage flags
	Agent 6 — MiniCPM5-1B generates rupee savings report + action items
	↓
	₹ TOTAL LEAKAGE DETECTED: ₹858
	```

	---

	## Six-Agent Pipeline

	```
	┌──────────────────────────────────┐
	│ Agent 1 — Invoice Extractor │ MiniCPM-V 4.6 (merged, bfloat16)
	│ Invoice image/PDF → JSON │ OCR + structured field extraction
	└──────────────┬───────────────────┘
	↓
	┌──────────────────────────────────┐
	│ Agent 2 — Product Matcher │ MiniCPM5-1B (GGUF Q4_K_M)
	│ Raw names → canonical SKU IDs │ "MAGGI NDL" → Nestle Maggi 70g
	└──────────────┬───────────────────┘
	↓
	┌──────────────────────────────────┐
	│ Agent 3 — Pricing Agent │ Rule-based (SQLite history)
	│ Normalized invoice → price flags│ Detects overcharges & GST errors
	└──────────────┬───────────────────┘
	↓
	┌──────────────────────────────────┐
	│ Agent 4 — Visual Counter │ YOLO26n (ONNX, 1,831 classes)
	│ Delivery photos → product counts│ mAP50 = 0.428 on merged dataset
	└──────────────┬───────────────────┘
	↓
	┌──────────────────────────────────┐
	│ Agent 5 — Reconciliation Agent │ Rule-based
	│ Invoice qty vs. counted qty │ Shortage flags + ₹ loss
	└──────────────┬───────────────────┘
	↓
	┌──────────────────────────────────┐
	│ Agent 6 — Savings Agent │ MiniCPM5-1B (GGUF Q4_K_M)
	│ All flags → ₹ report + actions │ "Call HUL rep. Request credit note."
	└──────────────────────────────────┘
	```

	Every agent run is traced and logged for the Sharing is Caring badge.

	---

	## Fine-Tuned Models

	All three models were trained from scratch on Modal A10G GPUs and published to HuggingFace. Total training cost: ~$5.80.

	### Model 1 — MiniCPM-V 4.6 (Invoice Extractor)

	\| \| \|
	\|---\|---\|
	\| Repo \| [`build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged`](https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged) \|
	\| Base \| openbmb/MiniCPM-V-4.6 \|
	\| Method \| QLoRA rank 16 (PEFT + bitsandbytes), then merged to full bfloat16 weights \|
	\| Data \| 500 synthetic Indian invoices — printed GST, Tally PDF, handwritten, WhatsApp \|
	\| Eval loss \| 0.212 (epoch 3 / 3) \|
	\| Training \| 51 min 50 sec on A10G · 87 steps · 9.5M trainable params (0.72%) \|
	\| Inference \| `transformers` AutoModel · `model.chat()` — no PEFT at runtime \|

	### Model 2 — MiniCPM5-1B (Product Normalizer)

	\| \| \|
	\|---\|---\|
	\| Repo \| [`build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer`](https://huggingface.co/build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer) \|
	\| Base \| openbmb/MiniCPM5-1B \|
	\| Method \| QLoRA rank 16 via Unsloth, exported to GGUF Q4_K_M \|
	\| Data \| 2,000 synthetic (raw_name → canonical_name) pairs · 200 Indian FMCG SKUs \|
	\| Training \| ~1 hour on A10G \|
	\| Inference \| llama-cpp-python · `create_chat_completion()` \|

	### Model 3 — YOLO26n (Product Detector)

	\| \| \|
	\|---\|---\|
	\| Repo \| [`build-small-hackathon/yolo26n-indian-fmcg-detection`](https://huggingface.co/build-small-hackathon/yolo26n-indian-fmcg-detection) \|
	\| Base \| Ultralytics YOLO26n \|
	\| Method \| Supervised fine-tuning on 3 merged Roboflow datasets \|
	\| Data \| ~11,400 images · 1,831 unified classes \|
	\| Metrics \| mAP50 = 0.428 · mAP50-95 = 0.302 · 100 epochs (A10G) \|
	\| Inference \| ONNX Runtime · CPU or GPU \|

	### Training Dataset

	\| \| \|
	\|---\|---\|
	\| Repo \| [`build-small-hackathon/kirana-invoice-train-data`](https://huggingface.co/datasets/build-small-hackathon/kirana-invoice-train-data) \|
	\| Contents \| 500 synthetic invoice images (450 train / 50 eval) with structured JSON annotations \|

	---

	## Running Locally

	```bash
	git clone https://github.com/naazimsnh02/kirana-detective.git
	cd kirana-detective
	pip install -r requirements.txt
	python app.py
	```

	First run: downloads ~3 GB of model weights (cached after that).
	Requirements: ~6 GB RAM · Python 3.11 · optional CUDA GPU for faster MiniCPM-V inference.

	### Environment Variables

	\| Variable \| Required \| Purpose \|
	\|---\|---\|---\|
	\| `HF_TOKEN` \| Optional \| Faster downloads from HF Hub (avoids rate limits) \|

	---

	## Re-Training the Models

	All training scripts are in `finetune/`. Training is orchestrated on Modal.

	```bash
	export HF_TOKEN=<your-token>
	export ROBOFLOW_API_KEY=<your-key>

	modal run finetune/generate_invoices.py # ~10 min — generate 500 synthetic invoices
	modal run finetune/train_minicpm_v.py # ~52 min — fine-tune invoice extractor
	modal run finetune/export_minicpm_v_gguf.py # ~10 min — merge LoRA → push HF weights
	modal run finetune/train_minicpm5_1b.py # ~1 hour — fine-tune product normalizer
	modal run finetune/train_yolo26n.py # ~2 hours — fine-tune YOLO26n detector
	```

	Scripts publish to `naazimsnh02/` first; transfer to `build-small-hackathon/` manually after.
	See [`finetune/README.md`](finetune/README.md) for the full workflow.

	---

	## Model Architecture

	\| Component \| Model \| Parameters \| Runtime \|
	\|---\|---\|---\|---\|
	\| Invoice OCR & extraction \| MiniCPM-V 4.6 (merged) \| 1.3B \| transformers \|
	\| Product normalisation + report \| MiniCPM5-1B (GGUF Q4_K_M) \| 1.08B \| llama-cpp-python \|
	\| Product detection & counting \| YOLO26n (ONNX) \| ~2.4M \| onnxruntime \|
	\| Total \| — \| ~2.38B \| — \|

	Comfortably within the Tiny Titan ≤4B threshold. Zero cloud API calls — fully local inference.

	---

	## Hackathon Badges

	\| Badge \| How \|
	\|---\|---\|
	\| 🎯 Well-Tuned \| 3 custom fine-tuned models published on HF Hub \|
	\| 🔌 Off the Grid \| 100% local inference — MiniCPM-V (transformers) + MiniCPM5-1B (GGUF) + YOLO26n (ONNX) \|
	\| 🦙 Llama Champion \| MiniCPM5-1B served via llama-cpp-python (GGUF Q4_K_M) \|
	\| 🎨 Off-Brand \| Custom Gradio UI — rupee savings cards, colour-coded anomaly flags \|
	\| 📡 Sharing is Caring \| Two public trace datasets: runtime audit traces per run + full Claude Code build sessions \|
	\| 📓 Field Notes \| "How I built an AI auditor for India's 12 million kirana stores" \|
	\| 🏋️ Tiny Titan \| ~2.38B total parameters — OCR + normalization + counting + report \|

	---

	## Sharing is Caring — Trace Datasets

	Two public datasets document everything that happened in this project:

	### 1. Runtime Audit Traces — `build-small-hackathon/kirana-detective-traces`

	Published automatically after every invoice audit run by [`tracer.py`](tracer.py). Each file records one complete pipeline execution — all six agents, inputs, outputs, and timings.

	### 2. Claude Code Build Sessions — `build-small-hackathon/kirana-detective-build-traces`

	The 11 raw Claude Code (Sonnet 4.6) JSONL sessions used to design, code, debug, and document this entire project — from blank repo to hackathon submission. Viewable in HF Data Studio's native agent trace viewer.

	\| Dataset \| Contents \| Format \|
	\|---\|---\|---\|
	\| [`build-small-hackathon/kirana-detective-traces`](https://huggingface.co/datasets/build-small-hackathon/kirana-detective-traces) \| Per-audit runtime traces (6 agents per run) \| JSONL, auto-published by app \|
	\| [`build-small-hackathon/kirana-detective-build-traces`](https://huggingface.co/datasets/build-small-hackathon/kirana-detective-build-traces) \| 11 Claude Code build sessions · ~8.9 MB \| Native JSONL trace viewer \|

	To upload build traces:

	```bash
	export HF_TOKEN=<your-token>
	python finetune/upload_build_traces.py
	```

	---

	## Project Structure

	```
	kirana-detective/
	├── app.py # Gradio + FastAPI server
	├── pipeline.py # AuditOrchestrator (6-agent runner)
	├── models.py # Dataclasses: InvoiceJSON, LeakageReport, etc.
	├── catalog.py # FMCG product catalog + alias lookup
	├── storage.py # SQLite price history + audit log
	├── tracer.py # Agent trace logging → HF Hub
	├── agents/
	│ ├── invoice_extractor.py # Agent 1 — MiniCPM-V 4.6
	│ ├── product_matcher.py # Agent 2 — MiniCPM5-1B (alias + LLM)
	│ ├── pricing_agent.py # Agent 3 — rule-based price checks
	│ ├── visual_counter.py # Agent 4 — YOLO26n ONNX
	│ ├── reconciliation_agent.py # Agent 5 — invoice vs. photo reconciliation
	│ └── savings_agent.py # Agent 6 — MiniCPM5-1B report generator
	├── finetune/
	│ ├── README.md # Training workflow guide
	│ ├── generate_invoices.py # Synthetic invoice generator
	│ ├── train_minicpm_v.py # Fine-tune MiniCPM-V
	│ ├── train_minicpm5_1b.py # Fine-tune MiniCPM5-1B
	│ ├── train_yolo26n.py # Fine-tune YOLO26n
	│ ├── export_minicpm_v_gguf.py# Merge LoRA → push HF weights
	│ └── upload_build_traces.py # Upload Claude Code sessions → HF Hub
	├── data/
	│ └── fmcg_catalog.json # 200 canonical SKU names + GST rates
	└── MODEL_CARD.md # Full training + evaluation documentation
	```

	---

	## Links

	- HF Space: [build-small-hackathon/kirana-detective](https://huggingface.co/spaces/build-small-hackathon/kirana-detective)
	- Training dataset: [build-small-hackathon/kirana-invoice-train-data](https://huggingface.co/datasets/build-small-hackathon/kirana-invoice-train-data)
	- Invoice extractor: [build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged](https://huggingface.co/build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged)
	- Product normalizer: [build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer](https://huggingface.co/build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer)
	- Product detector: [build-small-hackathon/yolo26n-indian-fmcg-detection](https://huggingface.co/build-small-hackathon/yolo26n-indian-fmcg-detection)
	- Full model card: [MODEL_CARD.md](MODEL_CARD.md)
	- Runtime audit traces: [build-small-hackathon/kirana-detective-traces](https://huggingface.co/datasets/build-small-hackathon/kirana-detective-traces)
	- Build sessions (Claude Code): [build-small-hackathon/kirana-detective-build-traces](https://huggingface.co/datasets/build-small-hackathon/kirana-detective-build-traces)
	- PRD: [docs/kirana-detective-prd.md](docs/kirana-detective-prd.md)

	---

	## License

	- Code: MIT
	- MiniCPM-V / MiniCPM5-1B: Apache 2.0 (OpenBMB)
	- YOLO26n: AGPL-3.0 (Ultralytics)

	---

	HuggingFace Build Small Hackathon 2026 · Track 1: Backyard AI · [naazimsnh02](https://github.com/naazimsnh02)