kirana-detective / SUBMISSION.md
naazimsnh02's picture
Final Submission
e0446f7
|
Raw
History Blame Contribute Delete
9.91 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Kirana Detective β€” Hackathon Submission

HuggingFace Build Small Hackathon 2026

Space build-small-hackathon/kirana-detective
Demo Video YouTube
Blog Post How I Built an AI Auditor for India's 12 Million Kirana Stores
Social Post X / Twitter
Track Track 1: Backyard AI
Total Parameters ~2.38B (Tiny Titan βœ…)

Track: Backyard AI

Problem: India's 12 million kirana store owners receive 3–5 distributor invoices per week via WhatsApp, printed bills, or Tally exports. Manual verification is impossible. Distributors overcharge, deliver short quantities, and apply wrong GST rates. A single store loses β‚Ή3,000–₹8,000 per month silently.

Solution: Upload an invoice + delivery photos β†’ receive a β‚Ή leakage report in under 60 seconds. Every finding (overcharge, shortage, GST error, duplicate) maps to a rupee amount and an action step.

Real user: Tested against real invoice formats from kirana distributors in India (HUL, ITC, NestlΓ©, Britannia).

Model constraint fit: Entire pipeline runs on CPU β€” no GPU required at inference. Designed for Tier 2/3 city deployment where GPU hardware is absent and internet is patchy.


Merit Badges

βœ… Off the Grid

Zero cloud API calls. All inference runs locally:

  • MiniCPM-V 4.6 via transformers (merged bfloat16 weights)
  • MiniCPM5-1B via llama-cpp-python (GGUF Q4_K_M)
  • YOLO26n via ONNX Runtime

Invoice data never leaves the device. Suitable for privacy-sensitive business data.

βœ… Well-Tuned

Three custom models fine-tuned from scratch and published on HF Hub:

Model Repo Task
MiniCPM-V 4.6 build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged Invoice OCR β†’ structured JSON
MiniCPM5-1B build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer Product name normalisation + savings report
YOLO26n build-small-hackathon/yolo26n-indian-fmcg-detection Product counting from delivery photos

Training dataset: build-small-hackathon/kirana-invoice-train-data β€” 500 synthetic Indian invoices (printed GST, Tally PDF, handwritten, WhatsApp).

βœ… Off-Brand

Custom Gradio UI β€” not default Gradio. Features:

  • Rupee savings cards with colour-coded anomaly type (overcharge = red, shortage = amber, duplicate = purple, GST = orange)
  • Agent progress stream with per-agent timing
  • Collapsible raw JSON view per agent
  • Dark/warm colour scheme themed around the kirana store context

βœ… Llama Champion

MiniCPM5-1B is served entirely via llama-cpp-python using a GGUF Q4_K_M quantised model. Used for both Agent 2 (product normalisation) and Agent 6 (savings report generation). No transformers at runtime for these two agents β€” pure llama.cpp.

βœ… Sharing is Caring

11 raw Claude Code (Sonnet 4.6) JSONL build sessions published as a public trace dataset, viewable in HF Data Studio's native agent trace viewer:

build-small-hackathon/kirana-detective-build-traces β€” complete design, coding, debugging, and documentation sessions from blank repo to submission.

βœ… Field Notes

Full blog post: How I Built an AI Auditor for India's 12 Million Kirana Stores

Covers: the problem, the 6-agent pipeline design, all three fine-tuned models with training details, the local-inference rationale, the hardest bug, and the full stack.


Prize Categories Targeted

Special Awards

πŸ‹οΈ Tiny Titan β€” ~2.38B total parameters across all three models combined.

Component Parameters
MiniCPM-V 4.6 (merged bfloat16) ~1.3B
MiniCPM5-1B (GGUF Q4_K_M) ~1.08B
YOLO26n (ONNX) ~2.4M
Total ~2.38B

Well within the ≀4B Tiny Titan threshold.

πŸ€– Best Agent β€” Fully modular 6-agent pipeline. Each agent has a single responsibility, a defined input/output contract, and produces an AgentTraceEntry with timing. Generator-based streaming shows live agent progress in the UI.

🎨 Off-Brand β€” Custom Gradio UI with rupee savings cards, colour-coded anomaly flags, and agent-by-agent progress stream. Distinctly different from the default Gradio look.

πŸ“Š Best Demo β€” End-to-end demo covering: invoice upload β†’ extraction β†’ normalisation β†’ price check β†’ delivery photo counting β†’ shortage reconciliation β†’ β‚Ή savings report with action items.

πŸŽ–οΈ Bonus Quest Champion β€” All 6 merit badges claimed on a single submission:

# Badge Evidence
1 Off the Grid Zero cloud API calls β€” MiniCPM-V (transformers) + MiniCPM5-1B (llama.cpp) + YOLO26n (ONNX), all CPU
2 Well-Tuned 3 custom fine-tuned models published on HF Hub (MiniCPM-V 4.6, MiniCPM5-1B, YOLO26n)
3 Off-Brand Custom Gradio UI β€” rupee savings cards, colour-coded anomaly flags, per-agent streaming progress
4 Llama Champion MiniCPM5-1B served via llama-cpp-python (GGUF Q4_K_M) for both Agent 2 and Agent 6
5 Sharing is Caring 11 Claude Code build sessions published at build-small-hackathon/kirana-detective-build-traces
6 Field Notes Blog post at huggingface.co/blog/build-small-hackathon/kirana-detective

Full sash. All badges earned independently, each with verifiable evidence.

πŸ—³οΈ Community Choice β€” Kirana Detective is built around a problem that 12 million Indian shopkeepers face every week. It is demonstrable to anyone who has ever received a bill they couldn't verify β€” which is most of the world. The live Space requires no setup, no account, and produces a tangible rupee number in under a minute. The blog post and X post are live for community sharing. Encouraging votes from the community.

Sponsor Awards

OpenBMB ($10,000 pool)

Both language models are from OpenBMB's MiniCPM family:

  • MiniCPM-V 4.6 (openbmb/MiniCPM-V-4.6) β€” fine-tuned for Indian invoice extraction
  • MiniCPM5-1B (openbmb/MiniCPM5-1B) β€” fine-tuned for FMCG product normalisation and report generation

Both are fine-tuned, pushed to HF Hub, and used in production in the Space. MiniCPM5-1B runs as GGUF via llama.cpp (cross-qualifying for Llama Champion badge).

Modal ($20,000 in credits)

All three models were trained on Modal A10G GPUs using Modal's @app.function decorator with GPU provisioning. Total compute: ~4.5 hours of A10G time, ~$5.80 total cost.

Training scripts in finetune/:

  • finetune/train_minicpm_v.py β€” MiniCPM-V 4.6 fine-tuning (51 min, A10G)
  • finetune/train_minicpm5_1b.py β€” MiniCPM5-1B fine-tuning (~1 hr, A10G)
  • finetune/train_yolo26n.py β€” YOLO26n fine-tuning (~2 hrs, A10G)
  • finetune/generate_invoices.py β€” synthetic invoice generation (Modal function)
  • finetune/export_minicpm_v_gguf.py β€” LoRA merge + GGUF export (Modal function)

NVIDIA

YOLO26n is exported to ONNX and can leverage NVIDIA GPU acceleration via ONNX Runtime when available (falls back to CPU). The A10G GPU used for all training is NVIDIA hardware. ONNX Runtime GPU execution provider supports CUDA/TensorRT for deployment on NVIDIA hardware.


Six-Agent Pipeline Summary

Agent 1 β€” Invoice Extractor     MiniCPM-V 4.6 (OpenBMB, fine-tuned)  β†’  Structured JSON
Agent 2 β€” Product Matcher       MiniCPM5-1B (OpenBMB, GGUF, llama.cpp) β†’  Canonical SKU names
Agent 3 β€” Pricing Agent         Rule-based (SQLite history)             β†’  Price / GST flags
Agent 4 β€” Visual Counter        YOLO26n (ONNX Runtime)                  β†’  Product counts
Agent 5 β€” Reconciliation Agent  Rule-based                              β†’  Shortage flags + β‚Ή loss
Agent 6 β€” Savings Agent         MiniCPM5-1B (OpenBMB, GGUF, llama.cpp) β†’  β‚Ή report + actions

Constraints Met

Constraint Status
Models ≀ 32B parameters βœ… ~2.38B total
Gradio UI βœ… Custom Gradio 6.16
Hosted as HF Space βœ… build-small-hackathon/kirana-detective
Demo video βœ… YouTube
Social media post βœ… X post

Links