# Kirana Detective AI ## AI-Powered Inventory & Invoice Auditor for Indian Kirana Stores | Field | Value | |---|---| | Version | MVP v1.0 | | Hackathon | Hugging Face Build Small Hackathon 2026 | | Track | Track 1: Backyard AI | | Deadline | June 15, 2026 | --- ## Executive Summary Kirana Detective AI helps Indian kirana store owners detect profit leakage by automatically auditing invoices, validating deliveries, identifying pricing anomalies, and comparing invoice quantities against actual products visible in shelf or carton photos. The system acts as an AI-powered business auditor that helps small retailers identify billing errors, missing products, supplier discrepancies, and inventory issues that would otherwise go unnoticed. Unlike generic AI assistants, Kirana Detective solves a highly specific problem for a clearly defined user group and produces measurable financial value — a rupee savings number that is concrete, judge-friendly, and immediately relatable to any Indian evaluator. --- ## Problem Statement India has approximately 12 million kirana stores. Most operate with: - Printed invoices from distributors - WhatsApp invoice screenshots - Manual or no delivery verification - Informal bookkeeping ### Common Loss Sources | Issue | Example | |---|---| | Supplier overcharging | Charged ₹255 for Surf Excel, should be ₹220 | | Missing delivery items | Invoice says 50 Coke bottles, 46 delivered | | Incorrect GST applied | Aashirvaad Atta at 12% instead of 5% | | Duplicate invoice lines | Same product charged twice | | Unclaimed distributor discounts | Buy-10-get-1 offer never applied | | Dead inventory | Corn Flakes unsold for 75 days | Each mistake is small. Monthly losses accumulate to ₹2,000–₹20,000 per store. Store owners rarely have time to manually audit invoices and deliveries. **Kirana Detective becomes their AI auditor.** --- ## Vision > "Find where money is being lost." The goal is not accounting. The goal is detecting profit leakage and converting every finding into a rupee value. --- ## Primary User **Ravi — Kirana Store Owner, Chennai** - Runs a neighbourhood provision store - Receives 3–5 distributor invoices per week - Gets most invoices via WhatsApp - Uses an Android phone - Low technical skill — needs a tap-and-see interface - Loses approximately ₹3,000–₹8,000/month to undetected billing errors --- ## Success Metrics | Metric | Target | |---|---| | Detected savings per audit | ≥ ₹500 shown to user | | Invoice audit time | < 60 seconds | | Delivery verification accuracy | ≥ 80% on carton photos | | "Actually used it" proof | Demo video with real kirana owner | --- ## MVP Scope (Must-Build for Hackathon) Focus ruthlessly on this single killer workflow: ``` Invoice Upload → Delivery Photo Upload → Missing Product Detection → ₹ Savings Report ``` ### Must Have - ✅ Invoice image / PDF upload - ✅ Invoice OCR and structured extraction - ✅ Product name normalization - ✅ Price anomaly detection vs. historical invoices - ✅ Delivery photo upload and product counting (YOLO26n) - ✅ Invoice vs. delivery reconciliation - ✅ Profit leakage dashboard with ₹ savings total - ✅ Agent trace logging (for Sharing is Caring badge) - ✅ Custom Gradio UI (not default theme) ### Deferred to Future - Expiry date detection - Dead stock / slow-moving inventory alerts - Supplier trust score - Supplier negotiation insights - Multi-store analytics - WhatsApp bot integration - Demand forecasting --- ## Core Features (MVP) ### Feature 1 — Invoice Understanding **Input:** Invoice image (photo, PDF, WhatsApp screenshot) **Model:** MiniCPM-V 4.6 (fine-tuned on Indian invoice formats) **AI Tasks:** - OCR extraction of all invoice fields - Handling mixed English + Tamil/Hindi/Telugu text - Parsing Tally printouts, handwritten bills, GST invoices **Output: Structured Invoice JSON** ```json { "invoice_number": "INV-2024-8821", "supplier": "Hindustan Unilever Ltd", "date": "2026-06-08", "items": [ { "product_raw": "SURF EXCEL 1KG", "product_normalized": "Surf Excel Washing Powder 1kg", "quantity": 10, "unit_price": 255.00, "gst_rate": 18, "line_total": 2550.00 } ], "grand_total": 2550.00 } ``` --- ### Feature 2 — Product Name Normalization **Problem:** Distributor invoices use inconsistent product names. | Invoice Text | Normalized Name | |---|---| | MAGGI 70GM | Nestle Maggi Masala Noodles 70g | | MAGGI NDL | Nestle Maggi Masala Noodles 70g | | SURF XL 1K | Surf Excel Washing Powder 1kg | | PARLE G 80 | Parle-G Biscuit 80g | | COLGAT 100G | Colgate Strong Teeth Toothpaste 100g | **Model:** Fine-tuned MiniCPM5-1B on Indian FMCG SKU normalization dataset **Output:** Consistent product catalog entries that allow historical price comparisons across different invoices from the same supplier. --- ### Feature 3 — Pricing Anomaly Detection **Logic:** Rule-based comparison against stored historical invoice data. **Example:** ``` Product: Surf Excel Washing Powder 1kg Historical price (last 3 invoices): ₹220 | ₹220 | ₹222 Current invoice price: ₹255 ⚠ Price increase detected: +15.9% Estimated excess charge (10 units): ₹330 ``` > No ML needed here — arithmetic + historical lookup is both sufficient and more trustworthy than a model for financial comparisons. --- ### Feature 4 — Duplicate Charge Detection **Logic:** Rule-based scan of extracted invoice JSON. **Detects:** - Same product appearing twice in one invoice - Same invoice number submitted twice across sessions - Repeated line items with identical product + qty + price **Output:** ``` ⚠ Duplicate detected: Parle-G 80g appears twice on this invoice. Combined quantity: 40 units | Possible duplicate charge: ₹320 ``` --- ### Feature 5 — Delivery Verification (Visual Counting) **This is the centrepiece feature — the most visually impressive for the demo.** **Input:** Invoice JSON (from Feature 1) + 1–5 delivery photos **Model:** YOLO26n fine-tuned on Indian FMCG products (see Model Stack section) **Pipeline:** ``` Delivery Photo ↓ YOLO26n-nano (ONNX, local) → Detect bounding boxes → Count instances per product class → Output: {Coke 200ml: 20, Maggi 70g: 48} ↓ MiniCPM-V 4.6 → Cross-verify with invoice context → Generate natural-language summary ↓ Reconciliation Agent → Invoice qty vs detected qty → Calculate ₹ shortage value ``` **Example Output:** ``` Invoice expects: Coke 200ml × 24 Detected in photo: 20 bottles ⚠ Shortage: 4 bottles Estimated loss: ₹180 ``` **Important scope note:** Multi-image counting (Feature 6 in the original PRD) is simplified — the user uploads up to 5 photos of the same delivery, counts are aggregated, then reconciled against the invoice. No complex carton-stacking estimation is attempted. --- ### Feature 6 — Profit Leakage Dashboard **The "wow" output — everything converts to ₹.** ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ KIRANA DETECTIVE — AUDIT REPORT Supplier: HUL | Invoice: INV-8821 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⚠ Pricing Issues Surf Excel 1kg: +15.9% vs history ...... ₹330 ⚠ Delivery Shortage Coke 200ml: 4 bottles missing ........... ₹180 Maggi 70g: 2 packets missing ............. ₹28 ⚠ Duplicate Charge Parle-G 80g: possible duplicate ........ ₹320 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 💰 TOTAL LEAKAGE DETECTED: ₹858 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Actions: → Contact HUL rep about price increase → Request credit note for 4 Coke bottles → Verify Parle-G line item with distributor ``` --- ## AI Agent Workflow This multi-agent pipeline is explicitly designed for the **Best Agent** award. ``` ┌─────────────────────────────────────────┐ │ USER UPLOADS │ │ Invoice Image + Delivery Photos │ └──────────────┬──────────────────────────┘ ↓ ┌──────────────────────────────────┐ │ Agent 1: Invoice Extraction │ │ Model: MiniCPM-V 4.6 (ft) │ │ Input: Invoice image/PDF │ │ Output: Structured invoice JSON │ └──────────────┬───────────────────┘ ↓ ┌──────────────────────────────────┐ │ Agent 2: Product Matching │ │ Model: MiniCPM5-1B (ft) │ │ Input: Raw product names │ │ Output: Normalized product IDs │ └──────────────┬───────────────────┘ ↓ ┌──────────────────────────────────┐ │ Agent 3: Pricing Agent │ │ Logic: Rule-based │ │ Input: Normalized invoice │ │ Output: Price anomaly flags │ └──────────────┬───────────────────┘ ↓ ┌──────────────────────────────────┐ │ Agent 4: Visual Counting Agent │ │ Model: YOLO26n-FMCG (ft) │ │ Input: Delivery photos │ │ Output: {product: count} dict │ └──────────────┬───────────────────┘ ↓ ┌──────────────────────────────────┐ │ Agent 5: Reconciliation Agent │ │ Logic: Rule-based │ │ Input: Invoice qty + Photo qty │ │ Output: Shortage flags + ₹ loss │ └──────────────┬───────────────────┘ ↓ ┌──────────────────────────────────┐ │ Agent 6: Savings Agent │ │ Model: MiniCPM5-1B │ │ Input: All flags │ │ Output: ₹ report + action items │ └──────────────────────────────────┘ ``` Agent trace is logged and shared on HuggingFace Hub → **Sharing is Caring badge**. --- ## Model Stack ### Primary Vision Model — MiniCPM-V 4.6 | Property | Value | |---|---| | Developer | OpenBMB (Tsinghua University) | | Parameters | 1.3B | | Release status | Current MiniCPM-V 4.6 family model; released in 2026 | | Strengths | OCR-heavy document understanding, image/video inputs, edge-friendly multimodal reasoning | | Architecture note | SigLIP2-400M vision encoder + Qwen3.5-0.8B LLM | | GGUF / local support | Yes — supports llama.cpp/GGUF deployment for Off the Grid + Llama Champion badges | | Why chosen | Best fit for invoice OCR under the 32B cap and directly targets OpenBMB sponsor prize | **Tasks:** Invoice OCR, final report generation, cross-verification narration --- ### Counting Model — YOLO26n (Fine-tuned) | Property | Value | |---|---| | Developer | Ultralytics | | Parameters | ~2.4M fused model | | Strengths | Faster CPU ONNX inference than YOLO11n, accurate object detection + counting, edge-friendly | | Export format | ONNX (local inference, no llama.cpp needed) | | Why chosen | Latest Ultralytics nano detector; purpose-built for counting while VLMs hallucinate on dense product scenes | **Tasks:** Detect and count FMCG products in delivery photos > **Design decision:** YOLO26n handles counting because VLMs like MiniCPM-V underperform on dense shelf scenes with 20–50 identical objects. Each model does what it does best. --- ### Agent Orchestration Model — MiniCPM5-1B | Property | Value | |---|---| | Developer | OpenBMB | | Parameters | 1.08B | | Context length | 131,072 tokens | | Strengths | Tool use, reasoning, code/JSON generation, workflow orchestration, report generation | | GGUF support | Yes — official GGUF release supports llama.cpp/Ollama/LM Studio workflows | | Why chosen | Current OpenBMB 1B-class model, better aligned than the older MiniCPM3 reference and strengthens OpenBMB prize positioning | **Tasks:** Product normalization, agent orchestration, savings report text generation --- ### Parameter Budget | Component | Model | Parameters | |---|---|---| | Invoice/document vision | MiniCPM-V 4.6 | 1.3B | | Product normalization + agent text | MiniCPM5-1B | 1.08B | | Product detection/counting | YOLO26n | ~2.4M | | Total active model budget | Combined stack | ~2.38B | This keeps the app far below the hackathon's 32B cap and within the Tiny Titan special-award range (<=4B), while still using separate models for the tasks they handle best. ### Current Model References - MiniCPM-V 4.6: `openbmb/MiniCPM-V-4.6` - MiniCPM5-1B: `openbmb/MiniCPM5-1B` and `openbmb/MiniCPM5-1B-GGUF` - YOLO26n: Ultralytics YOLO26 nano detector, exported to ONNX after fine-tuning --- ## Fine-Tuning Strategy ### What to Fine-Tune (and Why) #### 1. MiniCPM-V 4.6 — Invoice Extraction **Why:** Indian invoice formats (Tally printouts, WhatsApp screenshots, handwritten GST bills, mixed-language text) are not well-represented in the base model's training data. Fine-tuning on 300–500 synthetic Indian invoices dramatically improves structured JSON output. **Dataset:** Synthetically generated using Claude/GPT — 500 invoices across: - 10 major Indian FMCG suppliers (HUL, Nestlé, Parle, Britannia, ITC, Amul, Dabur, Marico, Emami, Godrej) - 4 invoice formats (printed GST bill, handwritten, Tally export, WhatsApp screenshot) - Intentional errors: wrong GST, duplicate lines, price spikes **Platform:** Modal + Unsloth QLoRA (~2–3 hours training time) **Publish to:** `build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged` --- #### 2. YOLO26n — Indian FMCG Product Detection **Why:** Base YOLO26n is not trained on Indian grocery products. Fine-tuning on the existing Indian Grocery Object Detection dataset (Roboflow) gives the model the ability to reliably detect Parle-G, Maggi, Amul, Britannia, HUL products in kirana shelf/delivery photos. **Dataset:** [Indian Grocery Object Detection — Roboflow](https://universe.roboflow.com/agentsk47/indian-grocery-object-detection-mfsnx) — already annotated with bounding boxes for common Indian FMCG SKUs. **Training:** Ultralytics fine-tune on Modal GPU (~1–2 hours) **Export:** ONNX for local CPU inference **Publish to:** `build-small-hackathon/yolo26n-indian-fmcg-detection` --- #### 3. MiniCPM5-1B — Product Name Normalization **Why:** "MAGGI NDL 70GM", "MAGGI MASALA", and "MAGGI 70G" should all map to "Nestle Maggi Masala Noodles 70g". This requires Indian FMCG domain knowledge a general 1B model lacks. **Dataset:** 2,000 synthetic (raw_name, normalized_name) pairs covering top 200 Indian FMCG SKUs **Publish to:** `build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer` --- ### What NOT to Fine-Tune | Task | Why Not | |---|---| | GST rate validation | Pure lookup table by HSN code. 0/5/12/18/28%. Deterministic. | | Price anomaly detection | Simple arithmetic vs. stored history. More trustworthy without ML. | | Duplicate detection | String matching + invoice ID comparison. | | Savings calculation | Arithmetic. No model needed. | | Supplier trust scoring | Aggregation of existing rule-based signals. | --- ## Indian Context — Training Data Coverage ### FMCG Brands (Invoice Normalization Dataset) **Food:** Parle-G, Good Day, Britannia Marie, Maggi, Yippee, Aashirvaad Atta, Tata Salt, Amul Butter, Mother Dairy, Aavin **Home Care:** Surf Excel, Rin, Vim, Harpic, Lizol, Domex, Scotch-Brite, Mortein **Personal Care:** Colgate, Pepsodent, Clinic Plus, Pantene, Lux, Dove, Lifebuoy, Dettol, Parachute **Beverages:** Coca-Cola, Pepsi, Sprite, Thums Up, Frooti, Maaza, Bovonto (South India) ### GST Rate Lookup (Rule-Based, Not Fine-Tuned) | Rate | Example Products | |---|---| | 0% | Fresh milk, eggs, vegetables | | 5% | Packaged food, Atta, Dal, edible oil | | 12% | Butter, ghee, packaged dry fruits | | 18% | Soap, shampoo, toothpaste, detergent | | 28% | Aerated drinks, tobacco | ### Regional Language Support Invoice OCR handles mixed-language text including English, Tamil, Hindi, and Telugu — common in South Indian distributor invoices. --- ## Award Strategy ### OpenBMB Award **How:** MiniCPM-V 4.6 is the primary vision model for OCR, cross-verification, and report generation. MiniCPM5-1B handles orchestration, normalization, and report text. Both are current OpenBMB models, making the product visibly built around the sponsor's ecosystem. ### OpenAI Track **How:** The project is built with Codex as the primary coding agent, with Codex-authored commits and implementation traces included in the submission materials. The demo should explicitly show how Codex accelerated the build and helped produce the final Gradio app, making OpenAI's contribution load-bearing without adding a cloud API dependency. ### Modal Awards **How:** Modal is used for the fine-tuning runs for MiniCPM-V 4.6, MiniCPM5-1B, and YOLO26n, with training logs, artifacts, and published Hugging Face model links included in the Field Notes post. Modal is not just incidental infrastructure; it is the training engine that makes the local-first app domain-specific. ### Best Agent Award **How:** Six-agent pipeline with clear separation of concerns, visible agent trace logged to HuggingFace Hub. Not a single LLM call — genuine tool-using agent workflow. ### Well-Tuned Badge 🎯 **How:** Three fine-tuned models published on HuggingFace: 1. `minicpm-v-4-6-indian-invoice-extraction` 2. `yolo26n-indian-fmcg-detection` 3. `minicpm5-1b-indian-fmcg-normalizer` ### Off the Grid Badge 🔌 **How:** MiniCPM-V 4.6 GGUF via llama.cpp + MiniCPM5-1B GGUF via llama.cpp + YOLO26n ONNX — entire pipeline runs locally, zero cloud API calls. ### Llama Champion Badge 🦙 **How:** MiniCPM-V 4.6 and MiniCPM5-1B are served via llama.cpp using their GGUF quantized versions. ### Off-Brand Badge 🎨 **How:** Custom Gradio UI — not default theme. Audit report card design with ₹ savings prominently displayed, colour-coded anomaly flags, and clean mobile-friendly layout. ### Sharing is Caring Badge 📡 **How:** Agent trace logged after each audit run and shared as a HuggingFace dataset artifact. ### Field Notes Badge 📓 **How:** Blog post: *"How I built an AI auditor for India's 12 million kirana stores"* — covering dataset creation, fine-tuning decisions, real-world testing with a store owner. ### Bonus Quest Champion **How:** Stack the largest credible set of badges on one polished submission: Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is Caring, and Field Notes. ### Tiny Titan **How:** Total active model budget is approximately 2.38B parameters, comfortably below the <=4B Tiny Titan threshold while still handling OCR, agentic reasoning, normalization, and product counting. ### Best Demo **How:** The video centers on one concrete, emotional story: a real kirana owner finds a rupee-denominated loss, sees the missing items visually highlighted, and gets a practical supplier action list. The demo should show the app working, the owner reaction, the agent trace, and the final savings number. ### Community Choice **How:** Make the Space immediately understandable: upload sample invoice, upload sample delivery photos, run audit, see rupee savings. Pair the Space with a short social post using the India kirana angle and the "find where money is being lost" tagline. ### NVIDIA Nemotron Quest **Decision:** Explicitly not targeted. Chasing Nemotron would force a major stack change and weaken the OpenBMB/local-first Tiny Titan story. The submission focuses on Backyard AI, OpenBMB, OpenAI, Modal, and the bonus badges instead. --- ## Gradio UI Design ### Screen 1 — Upload ``` ┌─────────────────────────────────────────┐ │ 🔍 KIRANA DETECTIVE │ │ Your AI Business Auditor │ ├─────────────────────────────────────────┤ │ │ │ [ 📄 Upload Invoice ] │ │ Photo / PDF / WhatsApp screenshot │ │ │ │ [ 📷 Upload Delivery Photos ] │ │ Up to 5 photos of received goods │ │ │ │ Supplier Name: ___________________ │ │ │ │ [ 🔍 Run Audit ] │ │ │ └─────────────────────────────────────────┘ ``` ### Screen 2 — Results Dashboard ``` ┌─────────────────────────────────────────┐ │ AUDIT COMPLETE — HUL | INV-8821 │ ├─────────────────────────────────────────┤ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ ⚠ Price │ │ ⚠ Short │ │ ⚠ Dupli │ │ │ │ ₹330 │ │ ₹208 │ │ ₹320 │ │ │ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ ┌─────────────────────────────────┐ │ │ │ 💰 TOTAL LEAKAGE: ₹858 │ │ │ └─────────────────────────────────┘ │ │ │ │ [ 📋 Full Report ] [ 📤 Share ] │ │ │ └─────────────────────────────────────────┘ ``` --- ## Demo Story (For Submission Video) > Ravi, a kirana store owner in Chennai, uploads one invoice from his HUL distributor and three photos of the goods delivered that morning. > > In 45 seconds, Kirana Detective finds: > - Surf Excel is being charged 15.9% above the historical price > - 4 Coke bottles are missing from the delivery > - A Parle-G line item appears to be duplicated > > **Total leakage detected: ₹858** > > Ravi calls his distributor. The credit note is issued the same day. This outcome is **specific**, **measurable**, and **achievable in a real demo** — exactly what Backyard AI judges want to see. --- ## 10-Day Build Plan | Day | Task | Model/Tool | Risk | |---|---|---|---| | 1 | Fine-tune YOLO26n on Roboflow Indian Grocery dataset | Modal GPU, Ultralytics | Low | | 2 | Generate 500 synthetic Indian invoices; fine-tune MiniCPM-V 4.6 extraction | Modal + Unsloth | Medium | | 3 | Fine-tune MiniCPM5-1B product normalizer; publish all 3 models to HF | Modal + Unsloth | Low | | 4 | Build invoice OCR pipeline in Gradio: upload → MiniCPM-V → JSON | Python + Gradio | Medium | | 5 | Build YOLO26n delivery counting pipeline: photo → count dict | ONNX Runtime | Medium | | 6 | Build reconciliation agent + pricing anomaly detection | Rule-based Python | Low | | 7 | Build custom Gradio dashboard UI with ₹ savings cards | Gradio + CSS | Low | | 8 | Wire all agents together; implement trace logging; deploy to HF Space | LangGraph / custom | Medium | | 9 | Test with real kirana owner; record demo video; capture Codex-authored commit/story proof | Codex + real user testing | Low | | 10 | Write Field Notes blog; share agent trace; include Modal logs and final submission assets | HF Dataset + Modal logs | Low | --- ## Technical Stack Summary | Component | Technology | |---|---| | Frontend | Gradio (custom theme, Off-Brand) | | Hosting | Hugging Face Spaces | | Primary VLM | MiniCPM-V 4.6 (GGUF via llama.cpp) | | Agent Orchestrator | MiniCPM5-1B (GGUF via llama.cpp) | | Counting Model | YOLO26n fine-tuned (ONNX, local) | | Fine-tuning Platform | Modal + Unsloth (training engine for sponsor eligibility) | | Build Agent | OpenAI Codex (commit author + build trace for OpenAI Track positioning) | | Invoice parsing | PyMuPDF (PDF) + Gradio Image input | | Data storage | Local JSON / SQLite (no cloud DB) | | Agent tracing | Custom trace logger → HF Dataset | --- ## Risk Register | Risk | Likelihood | Mitigation | |---|---|---| | MiniCPM-V GGUF has high latency on CPU | Medium | Use 4-bit quantized Q4_K_M; fall back to float16 on HF Space GPU | | YOLO26n misses products not in Roboflow dataset | Medium | Limit demo to top 10 products; expand post-hackathon | | Delivery photo quality too low for counting | High | Show demo with clean carton photos; add "photo quality tip" in UI | | Fine-tuning time exceeds budget | Low | All 3 models trainable in < 6 hours total on Modal A10G | | OpenAI Track story looks indirect | Medium | Make Codex visible in commit metadata, implementation trace, Field Notes, and demo narrative | | Modal usage looks incidental | Low | Publish Modal training logs/artifacts and explicitly link fine-tuned models to Modal runs | | Scope creep during build week | High | Freeze scope at Day 3; no new features after Day 6 | --- *Kirana Detective AI — Build Small Hackathon 2026 — Track 1: Backyard AI*