Spaces:

build-small-hackathon
/

kirana-detective

Sleeping

App Files Files Community

kirana-detective / docs /kirana-detective-prd.md

naazimsnh02

All models training uploaded

9d75c8c 8 days ago

preview code

Raw

History Blame Contribute Delete

26.4 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Kirana Detective AI

AI-Powered Inventory & Invoice Auditor for Indian Kirana Stores

Field	Value
Version	MVP v1.0
Hackathon	Hugging Face Build Small Hackathon 2026
Track	Track 1: Backyard AI
Deadline	June 15, 2026

Executive Summary

Kirana Detective AI helps Indian kirana store owners detect profit leakage by automatically auditing invoices, validating deliveries, identifying pricing anomalies, and comparing invoice quantities against actual products visible in shelf or carton photos.

The system acts as an AI-powered business auditor that helps small retailers identify billing errors, missing products, supplier discrepancies, and inventory issues that would otherwise go unnoticed.

Unlike generic AI assistants, Kirana Detective solves a highly specific problem for a clearly defined user group and produces measurable financial value — a rupee savings number that is concrete, judge-friendly, and immediately relatable to any Indian evaluator.

Problem Statement

India has approximately 12 million kirana stores. Most operate with:

Printed invoices from distributors
WhatsApp invoice screenshots
Manual or no delivery verification
Informal bookkeeping

Common Loss Sources

Issue	Example
Supplier overcharging	Charged ₹255 for Surf Excel, should be ₹220
Missing delivery items	Invoice says 50 Coke bottles, 46 delivered
Incorrect GST applied	Aashirvaad Atta at 12% instead of 5%
Duplicate invoice lines	Same product charged twice
Unclaimed distributor discounts	Buy-10-get-1 offer never applied
Dead inventory	Corn Flakes unsold for 75 days

Each mistake is small. Monthly losses accumulate to ₹2,000–₹20,000 per store.

Store owners rarely have time to manually audit invoices and deliveries. Kirana Detective becomes their AI auditor.

Vision

"Find where money is being lost."

The goal is not accounting. The goal is detecting profit leakage and converting every finding into a rupee value.

Primary User

Ravi — Kirana Store Owner, Chennai

Runs a neighbourhood provision store
Receives 3–5 distributor invoices per week
Gets most invoices via WhatsApp
Uses an Android phone
Low technical skill — needs a tap-and-see interface
Loses approximately ₹3,000–₹8,000/month to undetected billing errors

Success Metrics

Metric	Target
Detected savings per audit	≥ ₹500 shown to user
Invoice audit time	< 60 seconds
Delivery verification accuracy	≥ 80% on carton photos
"Actually used it" proof	Demo video with real kirana owner

MVP Scope (Must-Build for Hackathon)

Focus ruthlessly on this single killer workflow:

Invoice Upload → Delivery Photo Upload → Missing Product Detection → ₹ Savings Report

Must Have

✅ Invoice image / PDF upload
✅ Invoice OCR and structured extraction
✅ Product name normalization
✅ Price anomaly detection vs. historical invoices
✅ Delivery photo upload and product counting (YOLO26n)
✅ Invoice vs. delivery reconciliation
✅ Profit leakage dashboard with ₹ savings total
✅ Agent trace logging (for Sharing is Caring badge)
✅ Custom Gradio UI (not default theme)

Deferred to Future

Expiry date detection
Dead stock / slow-moving inventory alerts
Supplier trust score
Supplier negotiation insights
Multi-store analytics
WhatsApp bot integration
Demand forecasting

Core Features (MVP)

Feature 1 — Invoice Understanding

Input: Invoice image (photo, PDF, WhatsApp screenshot)

Model: MiniCPM-V 4.6 (fine-tuned on Indian invoice formats)

AI Tasks:

OCR extraction of all invoice fields
Handling mixed English + Tamil/Hindi/Telugu text
Parsing Tally printouts, handwritten bills, GST invoices

Output: Structured Invoice JSON

{
  "invoice_number": "INV-2024-8821",
  "supplier": "Hindustan Unilever Ltd",
  "date": "2026-06-08",
  "items": [
    {
      "product_raw": "SURF EXCEL 1KG",
      "product_normalized": "Surf Excel Washing Powder 1kg",
      "quantity": 10,
      "unit_price": 255.00,
      "gst_rate": 18,
      "line_total": 2550.00
    }
  ],
  "grand_total": 2550.00
}

Feature 2 — Product Name Normalization

Problem: Distributor invoices use inconsistent product names.

Invoice Text	Normalized Name
MAGGI 70GM	Nestle Maggi Masala Noodles 70g
MAGGI NDL	Nestle Maggi Masala Noodles 70g
SURF XL 1K	Surf Excel Washing Powder 1kg
PARLE G 80	Parle-G Biscuit 80g
COLGAT 100G	Colgate Strong Teeth Toothpaste 100g

Model: Fine-tuned MiniCPM5-1B on Indian FMCG SKU normalization dataset

Output: Consistent product catalog entries that allow historical price comparisons across different invoices from the same supplier.

Feature 3 — Pricing Anomaly Detection

Logic: Rule-based comparison against stored historical invoice data.

Example:

Product: Surf Excel Washing Powder 1kg

Historical price (last 3 invoices):
  ₹220 | ₹220 | ₹222

Current invoice price: ₹255

⚠ Price increase detected: +15.9%
Estimated excess charge (10 units): ₹330

No ML needed here — arithmetic + historical lookup is both sufficient and more trustworthy than a model for financial comparisons.

Feature 4 — Duplicate Charge Detection

Logic: Rule-based scan of extracted invoice JSON.

Detects:

Same product appearing twice in one invoice
Same invoice number submitted twice across sessions
Repeated line items with identical product + qty + price

Output:

⚠ Duplicate detected: Parle-G 80g appears twice on this invoice.
Combined quantity: 40 units | Possible duplicate charge: ₹320

Feature 5 — Delivery Verification (Visual Counting)

This is the centrepiece feature — the most visually impressive for the demo.

Input: Invoice JSON (from Feature 1) + 1–5 delivery photos

Model: YOLO26n fine-tuned on Indian FMCG products (see Model Stack section)

Pipeline:

Delivery Photo
      ↓
YOLO26n-nano (ONNX, local)
  → Detect bounding boxes
  → Count instances per product class
  → Output: {Coke 200ml: 20, Maggi 70g: 48}
      ↓
MiniCPM-V 4.6
  → Cross-verify with invoice context
  → Generate natural-language summary
      ↓
Reconciliation Agent
  → Invoice qty vs detected qty
  → Calculate ₹ shortage value

Example Output:

Invoice expects: Coke 200ml × 24
Detected in photo: 20 bottles

⚠ Shortage: 4 bottles
Estimated loss: ₹180

Important scope note: Multi-image counting (Feature 6 in the original PRD) is simplified — the user uploads up to 5 photos of the same delivery, counts are aggregated, then reconciled against the invoice. No complex carton-stacking estimation is attempted.

Feature 6 — Profit Leakage Dashboard

The "wow" output — everything converts to ₹.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  KIRANA DETECTIVE — AUDIT REPORT
  Supplier: HUL | Invoice: INV-8821
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  ⚠  Pricing Issues
     Surf Excel 1kg: +15.9% vs history ...... ₹330

  ⚠  Delivery Shortage
     Coke 200ml: 4 bottles missing ........... ₹180
     Maggi 70g: 2 packets missing ............. ₹28

  ⚠  Duplicate Charge
     Parle-G 80g: possible duplicate ........ ₹320

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  💰 TOTAL LEAKAGE DETECTED:    ₹858
  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Actions:
  → Contact HUL rep about price increase
  → Request credit note for 4 Coke bottles
  → Verify Parle-G line item with distributor

AI Agent Workflow

This multi-agent pipeline is explicitly designed for the Best Agent award.

┌─────────────────────────────────────────┐
│           USER UPLOADS                  │
│   Invoice Image + Delivery Photos       │
└──────────────┬──────────────────────────┘
               ↓
┌──────────────────────────────────┐
│  Agent 1: Invoice Extraction     │
│  Model: MiniCPM-V 4.6 (ft)      │
│  Input:  Invoice image/PDF       │
│  Output: Structured invoice JSON │
└──────────────┬───────────────────┘
               ↓
┌──────────────────────────────────┐
│  Agent 2: Product Matching       │
│  Model: MiniCPM5-1B (ft)        │
│  Input:  Raw product names       │
│  Output: Normalized product IDs  │
└──────────────┬───────────────────┘
               ↓
┌──────────────────────────────────┐
│  Agent 3: Pricing Agent          │
│  Logic:  Rule-based              │
│  Input:  Normalized invoice      │
│  Output: Price anomaly flags     │
└──────────────┬───────────────────┘
               ↓
┌──────────────────────────────────┐
│  Agent 4: Visual Counting Agent  │
│  Model: YOLO26n-FMCG (ft)      │
│  Input:  Delivery photos         │
│  Output: {product: count} dict   │
└──────────────┬───────────────────┘
               ↓
┌──────────────────────────────────┐
│  Agent 5: Reconciliation Agent   │
│  Logic:  Rule-based              │
│  Input:  Invoice qty + Photo qty │
│  Output: Shortage flags + ₹ loss │
└──────────────┬───────────────────┘
               ↓
┌──────────────────────────────────┐
│  Agent 6: Savings Agent          │
│  Model: MiniCPM5-1B              │
│  Input:  All flags               │
│  Output: ₹ report + action items │
└──────────────────────────────────┘

Agent trace is logged and shared on HuggingFace Hub → Sharing is Caring badge.

Model Stack

Primary Vision Model — MiniCPM-V 4.6

Property	Value
Developer	OpenBMB (Tsinghua University)
Parameters	1.3B
Release status	Current MiniCPM-V 4.6 family model; released in 2026
Strengths	OCR-heavy document understanding, image/video inputs, edge-friendly multimodal reasoning
Architecture note	SigLIP2-400M vision encoder + Qwen3.5-0.8B LLM
GGUF / local support	Yes — supports llama.cpp/GGUF deployment for Off the Grid + Llama Champion badges
Why chosen	Best fit for invoice OCR under the 32B cap and directly targets OpenBMB sponsor prize

Tasks: Invoice OCR, final report generation, cross-verification narration

Counting Model — YOLO26n (Fine-tuned)

Property	Value
Developer	Ultralytics
Parameters	~2.4M fused model
Strengths	Faster CPU ONNX inference than YOLO11n, accurate object detection + counting, edge-friendly
Export format	ONNX (local inference, no llama.cpp needed)
Why chosen	Latest Ultralytics nano detector; purpose-built for counting while VLMs hallucinate on dense product scenes

Tasks: Detect and count FMCG products in delivery photos

Design decision: YOLO26n handles counting because VLMs like MiniCPM-V underperform on dense shelf scenes with 20–50 identical objects. Each model does what it does best.

Agent Orchestration Model — MiniCPM5-1B

Property	Value
Developer	OpenBMB
Parameters	1.08B
Context length	131,072 tokens
Strengths	Tool use, reasoning, code/JSON generation, workflow orchestration, report generation
GGUF support	Yes — official GGUF release supports llama.cpp/Ollama/LM Studio workflows
Why chosen	Current OpenBMB 1B-class model, better aligned than the older MiniCPM3 reference and strengthens OpenBMB prize positioning

Tasks: Product normalization, agent orchestration, savings report text generation

Parameter Budget

Component	Model	Parameters
Invoice/document vision	MiniCPM-V 4.6	1.3B
Product normalization + agent text	MiniCPM5-1B	1.08B
Product detection/counting	YOLO26n	~2.4M
Total active model budget	Combined stack	~2.38B

This keeps the app far below the hackathon's 32B cap and within the Tiny Titan special-award range (<=4B), while still using separate models for the tasks they handle best.

Current Model References

MiniCPM-V 4.6: openbmb/MiniCPM-V-4.6
MiniCPM5-1B: openbmb/MiniCPM5-1B and openbmb/MiniCPM5-1B-GGUF
YOLO26n: Ultralytics YOLO26 nano detector, exported to ONNX after fine-tuning

Fine-Tuning Strategy

What to Fine-Tune (and Why)

1. MiniCPM-V 4.6 — Invoice Extraction

Why: Indian invoice formats (Tally printouts, WhatsApp screenshots, handwritten GST bills, mixed-language text) are not well-represented in the base model's training data. Fine-tuning on 300–500 synthetic Indian invoices dramatically improves structured JSON output.

Dataset: Synthetically generated using Claude/GPT — 500 invoices across:

10 major Indian FMCG suppliers (HUL, Nestlé, Parle, Britannia, ITC, Amul, Dabur, Marico, Emami, Godrej)
4 invoice formats (printed GST bill, handwritten, Tally export, WhatsApp screenshot)
Intentional errors: wrong GST, duplicate lines, price spikes

Platform: Modal + Unsloth QLoRA (~2–3 hours training time)

Publish to: build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged

2. YOLO26n — Indian FMCG Product Detection

Why: Base YOLO26n is not trained on Indian grocery products. Fine-tuning on the existing Indian Grocery Object Detection dataset (Roboflow) gives the model the ability to reliably detect Parle-G, Maggi, Amul, Britannia, HUL products in kirana shelf/delivery photos.

Dataset: Indian Grocery Object Detection — Roboflow — already annotated with bounding boxes for common Indian FMCG SKUs.

Training: Ultralytics fine-tune on Modal GPU (~1–2 hours)

Export: ONNX for local CPU inference

Publish to: build-small-hackathon/yolo26n-indian-fmcg-detection

3. MiniCPM5-1B — Product Name Normalization

Why: "MAGGI NDL 70GM", "MAGGI MASALA", and "MAGGI 70G" should all map to "Nestle Maggi Masala Noodles 70g". This requires Indian FMCG domain knowledge a general 1B model lacks.

Dataset: 2,000 synthetic (raw_name, normalized_name) pairs covering top 200 Indian FMCG SKUs

Publish to: build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer

What NOT to Fine-Tune

Task	Why Not
GST rate validation	Pure lookup table by HSN code. 0/5/12/18/28%. Deterministic.
Price anomaly detection	Simple arithmetic vs. stored history. More trustworthy without ML.
Duplicate detection	String matching + invoice ID comparison.
Savings calculation	Arithmetic. No model needed.
Supplier trust scoring	Aggregation of existing rule-based signals.

Indian Context — Training Data Coverage

FMCG Brands (Invoice Normalization Dataset)

Food: Parle-G, Good Day, Britannia Marie, Maggi, Yippee, Aashirvaad Atta, Tata Salt, Amul Butter, Mother Dairy, Aavin

Home Care: Surf Excel, Rin, Vim, Harpic, Lizol, Domex, Scotch-Brite, Mortein

Personal Care: Colgate, Pepsodent, Clinic Plus, Pantene, Lux, Dove, Lifebuoy, Dettol, Parachute

Beverages: Coca-Cola, Pepsi, Sprite, Thums Up, Frooti, Maaza, Bovonto (South India)

GST Rate Lookup (Rule-Based, Not Fine-Tuned)

Rate	Example Products
0%	Fresh milk, eggs, vegetables
5%	Packaged food, Atta, Dal, edible oil
12%	Butter, ghee, packaged dry fruits
18%	Soap, shampoo, toothpaste, detergent
28%	Aerated drinks, tobacco

Regional Language Support

Invoice OCR handles mixed-language text including English, Tamil, Hindi, and Telugu — common in South Indian distributor invoices.

Award Strategy

OpenBMB Award

How: MiniCPM-V 4.6 is the primary vision model for OCR, cross-verification, and report generation. MiniCPM5-1B handles orchestration, normalization, and report text. Both are current OpenBMB models, making the product visibly built around the sponsor's ecosystem.

OpenAI Track

How: The project is built with Codex as the primary coding agent, with Codex-authored commits and implementation traces included in the submission materials. The demo should explicitly show how Codex accelerated the build and helped produce the final Gradio app, making OpenAI's contribution load-bearing without adding a cloud API dependency.

Modal Awards

How: Modal is used for the fine-tuning runs for MiniCPM-V 4.6, MiniCPM5-1B, and YOLO26n, with training logs, artifacts, and published Hugging Face model links included in the Field Notes post. Modal is not just incidental infrastructure; it is the training engine that makes the local-first app domain-specific.

Best Agent Award

How: Six-agent pipeline with clear separation of concerns, visible agent trace logged to HuggingFace Hub. Not a single LLM call — genuine tool-using agent workflow.

Well-Tuned Badge 🎯

How: Three fine-tuned models published on HuggingFace:

minicpm-v-4-6-indian-invoice-extraction
yolo26n-indian-fmcg-detection
minicpm5-1b-indian-fmcg-normalizer

Off the Grid Badge 🔌

How: MiniCPM-V 4.6 GGUF via llama.cpp + MiniCPM5-1B GGUF via llama.cpp + YOLO26n ONNX — entire pipeline runs locally, zero cloud API calls.

Llama Champion Badge 🦙

How: MiniCPM-V 4.6 and MiniCPM5-1B are served via llama.cpp using their GGUF quantized versions.

Off-Brand Badge 🎨

How: Custom Gradio UI — not default theme. Audit report card design with ₹ savings prominently displayed, colour-coded anomaly flags, and clean mobile-friendly layout.

Sharing is Caring Badge 📡

How: Agent trace logged after each audit run and shared as a HuggingFace dataset artifact.

Field Notes Badge 📓

How: Blog post: "How I built an AI auditor for India's 12 million kirana stores" — covering dataset creation, fine-tuning decisions, real-world testing with a store owner.

Bonus Quest Champion

How: Stack the largest credible set of badges on one polished submission: Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is Caring, and Field Notes.

Tiny Titan

How: Total active model budget is approximately 2.38B parameters, comfortably below the <=4B Tiny Titan threshold while still handling OCR, agentic reasoning, normalization, and product counting.

Best Demo

How: The video centers on one concrete, emotional story: a real kirana owner finds a rupee-denominated loss, sees the missing items visually highlighted, and gets a practical supplier action list. The demo should show the app working, the owner reaction, the agent trace, and the final savings number.

Community Choice

How: Make the Space immediately understandable: upload sample invoice, upload sample delivery photos, run audit, see rupee savings. Pair the Space with a short social post using the India kirana angle and the "find where money is being lost" tagline.

NVIDIA Nemotron Quest

Decision: Explicitly not targeted. Chasing Nemotron would force a major stack change and weaken the OpenBMB/local-first Tiny Titan story. The submission focuses on Backyard AI, OpenBMB, OpenAI, Modal, and the bonus badges instead.

Gradio UI Design

Screen 1 — Upload

┌─────────────────────────────────────────┐
│  🔍 KIRANA DETECTIVE                    │
│  Your AI Business Auditor               │
├─────────────────────────────────────────┤
│                                         │
│  [ 📄 Upload Invoice ]                  │
│  Photo / PDF / WhatsApp screenshot      │
│                                         │
│  [ 📷 Upload Delivery Photos ]          │
│  Up to 5 photos of received goods       │
│                                         │
│  Supplier Name: ___________________     │
│                                         │
│  [ 🔍 Run Audit ]                       │
│                                         │
└─────────────────────────────────────────┘

Screen 2 — Results Dashboard

┌─────────────────────────────────────────┐
│  AUDIT COMPLETE — HUL | INV-8821        │
├─────────────────────────────────────────┤
│                                         │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐ │
│  │ ⚠ Price │  │ ⚠ Short │  │ ⚠ Dupli │ │
│  │  ₹330   │  │  ₹208   │  │  ₹320   │ │
│  └─────────┘  └─────────┘  └─────────┘ │
│                                         │
│  ┌─────────────────────────────────┐    │
│  │  💰 TOTAL LEAKAGE: ₹858        │    │
│  └─────────────────────────────────┘    │
│                                         │
│  [ 📋 Full Report ]  [ 📤 Share ]       │
│                                         │
└─────────────────────────────────────────┘

Demo Story (For Submission Video)

Ravi, a kirana store owner in Chennai, uploads one invoice from his HUL distributor and three photos of the goods delivered that morning.

In 45 seconds, Kirana Detective finds:

Surf Excel is being charged 15.9% above the historical price

4 Coke bottles are missing from the delivery

A Parle-G line item appears to be duplicated

Total leakage detected: ₹858

Ravi calls his distributor. The credit note is issued the same day.

This outcome is specific, measurable, and achievable in a real demo — exactly what Backyard AI judges want to see.

10-Day Build Plan

Day	Task	Model/Tool	Risk
1	Fine-tune YOLO26n on Roboflow Indian Grocery dataset	Modal GPU, Ultralytics	Low
2	Generate 500 synthetic Indian invoices; fine-tune MiniCPM-V 4.6 extraction	Modal + Unsloth	Medium
3	Fine-tune MiniCPM5-1B product normalizer; publish all 3 models to HF	Modal + Unsloth	Low
4	Build invoice OCR pipeline in Gradio: upload → MiniCPM-V → JSON	Python + Gradio	Medium
5	Build YOLO26n delivery counting pipeline: photo → count dict	ONNX Runtime	Medium
6	Build reconciliation agent + pricing anomaly detection	Rule-based Python	Low
7	Build custom Gradio dashboard UI with ₹ savings cards	Gradio + CSS	Low
8	Wire all agents together; implement trace logging; deploy to HF Space	LangGraph / custom	Medium
9	Test with real kirana owner; record demo video; capture Codex-authored commit/story proof	Codex + real user testing	Low
10	Write Field Notes blog; share agent trace; include Modal logs and final submission assets	HF Dataset + Modal logs	Low

Technical Stack Summary

Component	Technology
Frontend	Gradio (custom theme, Off-Brand)
Hosting	Hugging Face Spaces
Primary VLM	MiniCPM-V 4.6 (GGUF via llama.cpp)
Agent Orchestrator	MiniCPM5-1B (GGUF via llama.cpp)
Counting Model	YOLO26n fine-tuned (ONNX, local)
Fine-tuning Platform	Modal + Unsloth (training engine for sponsor eligibility)
Build Agent	OpenAI Codex (commit author + build trace for OpenAI Track positioning)
Invoice parsing	PyMuPDF (PDF) + Gradio Image input
Data storage	Local JSON / SQLite (no cloud DB)
Agent tracing	Custom trace logger → HF Dataset

Risk Register

Risk	Likelihood	Mitigation
MiniCPM-V GGUF has high latency on CPU	Medium	Use 4-bit quantized Q4_K_M; fall back to float16 on HF Space GPU
YOLO26n misses products not in Roboflow dataset	Medium	Limit demo to top 10 products; expand post-hackathon
Delivery photo quality too low for counting	High	Show demo with clean carton photos; add "photo quality tip" in UI
Fine-tuning time exceeds budget	Low	All 3 models trainable in < 6 hours total on Modal A10G
OpenAI Track story looks indirect	Medium	Make Codex visible in commit metadata, implementation trace, Field Notes, and demo narrative
Modal usage looks incidental	Low	Publish Modal training logs/artifacts and explicitly link fine-tuned models to Modal runs
Scope creep during build week	High	Freeze scope at Day 3; no new features after Day 6

Kirana Detective AI — Build Small Hackathon 2026 — Track 1: Backyard AI