File size: 26,391 Bytes
7b09d63 9d75c8c 7b09d63 9d75c8c 7b09d63 9d75c8c 7b09d63 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 | # Kirana Detective AI
## AI-Powered Inventory & Invoice Auditor for Indian Kirana Stores
| Field | Value |
|---|---|
| Version | MVP v1.0 |
| Hackathon | Hugging Face Build Small Hackathon 2026 |
| Track | Track 1: Backyard AI |
| Deadline | June 15, 2026 |
---
## Executive Summary
Kirana Detective AI helps Indian kirana store owners detect profit leakage by automatically auditing invoices, validating deliveries, identifying pricing anomalies, and comparing invoice quantities against actual products visible in shelf or carton photos.
The system acts as an AI-powered business auditor that helps small retailers identify billing errors, missing products, supplier discrepancies, and inventory issues that would otherwise go unnoticed.
Unlike generic AI assistants, Kirana Detective solves a highly specific problem for a clearly defined user group and produces measurable financial value β a rupee savings number that is concrete, judge-friendly, and immediately relatable to any Indian evaluator.
---
## Problem Statement
India has approximately 12 million kirana stores. Most operate with:
- Printed invoices from distributors
- WhatsApp invoice screenshots
- Manual or no delivery verification
- Informal bookkeeping
### Common Loss Sources
| Issue | Example |
|---|---|
| Supplier overcharging | Charged βΉ255 for Surf Excel, should be βΉ220 |
| Missing delivery items | Invoice says 50 Coke bottles, 46 delivered |
| Incorrect GST applied | Aashirvaad Atta at 12% instead of 5% |
| Duplicate invoice lines | Same product charged twice |
| Unclaimed distributor discounts | Buy-10-get-1 offer never applied |
| Dead inventory | Corn Flakes unsold for 75 days |
Each mistake is small. Monthly losses accumulate to βΉ2,000ββΉ20,000 per store.
Store owners rarely have time to manually audit invoices and deliveries. **Kirana Detective becomes their AI auditor.**
---
## Vision
> "Find where money is being lost."
The goal is not accounting. The goal is detecting profit leakage and converting every finding into a rupee value.
---
## Primary User
**Ravi β Kirana Store Owner, Chennai**
- Runs a neighbourhood provision store
- Receives 3β5 distributor invoices per week
- Gets most invoices via WhatsApp
- Uses an Android phone
- Low technical skill β needs a tap-and-see interface
- Loses approximately βΉ3,000ββΉ8,000/month to undetected billing errors
---
## Success Metrics
| Metric | Target |
|---|---|
| Detected savings per audit | β₯ βΉ500 shown to user |
| Invoice audit time | < 60 seconds |
| Delivery verification accuracy | β₯ 80% on carton photos |
| "Actually used it" proof | Demo video with real kirana owner |
---
## MVP Scope (Must-Build for Hackathon)
Focus ruthlessly on this single killer workflow:
```
Invoice Upload β Delivery Photo Upload β Missing Product Detection β βΉ Savings Report
```
### Must Have
- β
Invoice image / PDF upload
- β
Invoice OCR and structured extraction
- β
Product name normalization
- β
Price anomaly detection vs. historical invoices
- β
Delivery photo upload and product counting (YOLO26n)
- β
Invoice vs. delivery reconciliation
- β
Profit leakage dashboard with βΉ savings total
- β
Agent trace logging (for Sharing is Caring badge)
- β
Custom Gradio UI (not default theme)
### Deferred to Future
- Expiry date detection
- Dead stock / slow-moving inventory alerts
- Supplier trust score
- Supplier negotiation insights
- Multi-store analytics
- WhatsApp bot integration
- Demand forecasting
---
## Core Features (MVP)
### Feature 1 β Invoice Understanding
**Input:** Invoice image (photo, PDF, WhatsApp screenshot)
**Model:** MiniCPM-V 4.6 (fine-tuned on Indian invoice formats)
**AI Tasks:**
- OCR extraction of all invoice fields
- Handling mixed English + Tamil/Hindi/Telugu text
- Parsing Tally printouts, handwritten bills, GST invoices
**Output: Structured Invoice JSON**
```json
{
"invoice_number": "INV-2024-8821",
"supplier": "Hindustan Unilever Ltd",
"date": "2026-06-08",
"items": [
{
"product_raw": "SURF EXCEL 1KG",
"product_normalized": "Surf Excel Washing Powder 1kg",
"quantity": 10,
"unit_price": 255.00,
"gst_rate": 18,
"line_total": 2550.00
}
],
"grand_total": 2550.00
}
```
---
### Feature 2 β Product Name Normalization
**Problem:** Distributor invoices use inconsistent product names.
| Invoice Text | Normalized Name |
|---|---|
| MAGGI 70GM | Nestle Maggi Masala Noodles 70g |
| MAGGI NDL | Nestle Maggi Masala Noodles 70g |
| SURF XL 1K | Surf Excel Washing Powder 1kg |
| PARLE G 80 | Parle-G Biscuit 80g |
| COLGAT 100G | Colgate Strong Teeth Toothpaste 100g |
**Model:** Fine-tuned MiniCPM5-1B on Indian FMCG SKU normalization dataset
**Output:** Consistent product catalog entries that allow historical price comparisons across different invoices from the same supplier.
---
### Feature 3 β Pricing Anomaly Detection
**Logic:** Rule-based comparison against stored historical invoice data.
**Example:**
```
Product: Surf Excel Washing Powder 1kg
Historical price (last 3 invoices):
βΉ220 | βΉ220 | βΉ222
Current invoice price: βΉ255
β Price increase detected: +15.9%
Estimated excess charge (10 units): βΉ330
```
> No ML needed here β arithmetic + historical lookup is both sufficient and more trustworthy than a model for financial comparisons.
---
### Feature 4 β Duplicate Charge Detection
**Logic:** Rule-based scan of extracted invoice JSON.
**Detects:**
- Same product appearing twice in one invoice
- Same invoice number submitted twice across sessions
- Repeated line items with identical product + qty + price
**Output:**
```
β Duplicate detected: Parle-G 80g appears twice on this invoice.
Combined quantity: 40 units | Possible duplicate charge: βΉ320
```
---
### Feature 5 β Delivery Verification (Visual Counting)
**This is the centrepiece feature β the most visually impressive for the demo.**
**Input:** Invoice JSON (from Feature 1) + 1β5 delivery photos
**Model:** YOLO26n fine-tuned on Indian FMCG products (see Model Stack section)
**Pipeline:**
```
Delivery Photo
β
YOLO26n-nano (ONNX, local)
β Detect bounding boxes
β Count instances per product class
β Output: {Coke 200ml: 20, Maggi 70g: 48}
β
MiniCPM-V 4.6
β Cross-verify with invoice context
β Generate natural-language summary
β
Reconciliation Agent
β Invoice qty vs detected qty
β Calculate βΉ shortage value
```
**Example Output:**
```
Invoice expects: Coke 200ml Γ 24
Detected in photo: 20 bottles
β Shortage: 4 bottles
Estimated loss: βΉ180
```
**Important scope note:** Multi-image counting (Feature 6 in the original PRD) is simplified β the user uploads up to 5 photos of the same delivery, counts are aggregated, then reconciled against the invoice. No complex carton-stacking estimation is attempted.
---
### Feature 6 β Profit Leakage Dashboard
**The "wow" output β everything converts to βΉ.**
```
ββββββββββββββββββββββββββββββββ
KIRANA DETECTIVE β AUDIT REPORT
Supplier: HUL | Invoice: INV-8821
ββββββββββββββββββββββββββββββββ
β Pricing Issues
Surf Excel 1kg: +15.9% vs history ...... βΉ330
β Delivery Shortage
Coke 200ml: 4 bottles missing ........... βΉ180
Maggi 70g: 2 packets missing ............. βΉ28
β Duplicate Charge
Parle-G 80g: possible duplicate ........ βΉ320
ββββββββββββββββββββββββββββββ
π° TOTAL LEAKAGE DETECTED: βΉ858
ββββββββββββββββββββββββββββββ
Actions:
β Contact HUL rep about price increase
β Request credit note for 4 Coke bottles
β Verify Parle-G line item with distributor
```
---
## AI Agent Workflow
This multi-agent pipeline is explicitly designed for the **Best Agent** award.
```
βββββββββββββββββββββββββββββββββββββββββββ
β USER UPLOADS β
β Invoice Image + Delivery Photos β
ββββββββββββββββ¬βββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ
β Agent 1: Invoice Extraction β
β Model: MiniCPM-V 4.6 (ft) β
β Input: Invoice image/PDF β
β Output: Structured invoice JSON β
ββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ
β Agent 2: Product Matching β
β Model: MiniCPM5-1B (ft) β
β Input: Raw product names β
β Output: Normalized product IDs β
ββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ
β Agent 3: Pricing Agent β
β Logic: Rule-based β
β Input: Normalized invoice β
β Output: Price anomaly flags β
ββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ
β Agent 4: Visual Counting Agent β
β Model: YOLO26n-FMCG (ft) β
β Input: Delivery photos β
β Output: {product: count} dict β
ββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ
β Agent 5: Reconciliation Agent β
β Logic: Rule-based β
β Input: Invoice qty + Photo qty β
β Output: Shortage flags + βΉ loss β
ββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββ
β Agent 6: Savings Agent β
β Model: MiniCPM5-1B β
β Input: All flags β
β Output: βΉ report + action items β
ββββββββββββββββββββββββββββββββββββ
```
Agent trace is logged and shared on HuggingFace Hub β **Sharing is Caring badge**.
---
## Model Stack
### Primary Vision Model β MiniCPM-V 4.6
| Property | Value |
|---|---|
| Developer | OpenBMB (Tsinghua University) |
| Parameters | 1.3B |
| Release status | Current MiniCPM-V 4.6 family model; released in 2026 |
| Strengths | OCR-heavy document understanding, image/video inputs, edge-friendly multimodal reasoning |
| Architecture note | SigLIP2-400M vision encoder + Qwen3.5-0.8B LLM |
| GGUF / local support | Yes β supports llama.cpp/GGUF deployment for Off the Grid + Llama Champion badges |
| Why chosen | Best fit for invoice OCR under the 32B cap and directly targets OpenBMB sponsor prize |
**Tasks:** Invoice OCR, final report generation, cross-verification narration
---
### Counting Model β YOLO26n (Fine-tuned)
| Property | Value |
|---|---|
| Developer | Ultralytics |
| Parameters | ~2.4M fused model |
| Strengths | Faster CPU ONNX inference than YOLO11n, accurate object detection + counting, edge-friendly |
| Export format | ONNX (local inference, no llama.cpp needed) |
| Why chosen | Latest Ultralytics nano detector; purpose-built for counting while VLMs hallucinate on dense product scenes |
**Tasks:** Detect and count FMCG products in delivery photos
> **Design decision:** YOLO26n handles counting because VLMs like MiniCPM-V underperform on dense shelf scenes with 20β50 identical objects. Each model does what it does best.
---
### Agent Orchestration Model β MiniCPM5-1B
| Property | Value |
|---|---|
| Developer | OpenBMB |
| Parameters | 1.08B |
| Context length | 131,072 tokens |
| Strengths | Tool use, reasoning, code/JSON generation, workflow orchestration, report generation |
| GGUF support | Yes β official GGUF release supports llama.cpp/Ollama/LM Studio workflows |
| Why chosen | Current OpenBMB 1B-class model, better aligned than the older MiniCPM3 reference and strengthens OpenBMB prize positioning |
**Tasks:** Product normalization, agent orchestration, savings report text generation
---
### Parameter Budget
| Component | Model | Parameters |
|---|---|---|
| Invoice/document vision | MiniCPM-V 4.6 | 1.3B |
| Product normalization + agent text | MiniCPM5-1B | 1.08B |
| Product detection/counting | YOLO26n | ~2.4M |
| Total active model budget | Combined stack | ~2.38B |
This keeps the app far below the hackathon's 32B cap and within the Tiny Titan special-award range (<=4B), while still using separate models for the tasks they handle best.
### Current Model References
- MiniCPM-V 4.6: `openbmb/MiniCPM-V-4.6`
- MiniCPM5-1B: `openbmb/MiniCPM5-1B` and `openbmb/MiniCPM5-1B-GGUF`
- YOLO26n: Ultralytics YOLO26 nano detector, exported to ONNX after fine-tuning
---
## Fine-Tuning Strategy
### What to Fine-Tune (and Why)
#### 1. MiniCPM-V 4.6 β Invoice Extraction
**Why:** Indian invoice formats (Tally printouts, WhatsApp screenshots, handwritten GST bills, mixed-language text) are not well-represented in the base model's training data. Fine-tuning on 300β500 synthetic Indian invoices dramatically improves structured JSON output.
**Dataset:** Synthetically generated using Claude/GPT β 500 invoices across:
- 10 major Indian FMCG suppliers (HUL, NestlΓ©, Parle, Britannia, ITC, Amul, Dabur, Marico, Emami, Godrej)
- 4 invoice formats (printed GST bill, handwritten, Tally export, WhatsApp screenshot)
- Intentional errors: wrong GST, duplicate lines, price spikes
**Platform:** Modal + Unsloth QLoRA (~2β3 hours training time)
**Publish to:** `build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged`
---
#### 2. YOLO26n β Indian FMCG Product Detection
**Why:** Base YOLO26n is not trained on Indian grocery products. Fine-tuning on the existing Indian Grocery Object Detection dataset (Roboflow) gives the model the ability to reliably detect Parle-G, Maggi, Amul, Britannia, HUL products in kirana shelf/delivery photos.
**Dataset:** [Indian Grocery Object Detection β Roboflow](https://universe.roboflow.com/agentsk47/indian-grocery-object-detection-mfsnx) β already annotated with bounding boxes for common Indian FMCG SKUs.
**Training:** Ultralytics fine-tune on Modal GPU (~1β2 hours)
**Export:** ONNX for local CPU inference
**Publish to:** `build-small-hackathon/yolo26n-indian-fmcg-detection`
---
#### 3. MiniCPM5-1B β Product Name Normalization
**Why:** "MAGGI NDL 70GM", "MAGGI MASALA", and "MAGGI 70G" should all map to "Nestle Maggi Masala Noodles 70g". This requires Indian FMCG domain knowledge a general 1B model lacks.
**Dataset:** 2,000 synthetic (raw_name, normalized_name) pairs covering top 200 Indian FMCG SKUs
**Publish to:** `build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer`
---
### What NOT to Fine-Tune
| Task | Why Not |
|---|---|
| GST rate validation | Pure lookup table by HSN code. 0/5/12/18/28%. Deterministic. |
| Price anomaly detection | Simple arithmetic vs. stored history. More trustworthy without ML. |
| Duplicate detection | String matching + invoice ID comparison. |
| Savings calculation | Arithmetic. No model needed. |
| Supplier trust scoring | Aggregation of existing rule-based signals. |
---
## Indian Context β Training Data Coverage
### FMCG Brands (Invoice Normalization Dataset)
**Food:**
Parle-G, Good Day, Britannia Marie, Maggi, Yippee, Aashirvaad Atta, Tata Salt, Amul Butter, Mother Dairy, Aavin
**Home Care:**
Surf Excel, Rin, Vim, Harpic, Lizol, Domex, Scotch-Brite, Mortein
**Personal Care:**
Colgate, Pepsodent, Clinic Plus, Pantene, Lux, Dove, Lifebuoy, Dettol, Parachute
**Beverages:**
Coca-Cola, Pepsi, Sprite, Thums Up, Frooti, Maaza, Bovonto (South India)
### GST Rate Lookup (Rule-Based, Not Fine-Tuned)
| Rate | Example Products |
|---|---|
| 0% | Fresh milk, eggs, vegetables |
| 5% | Packaged food, Atta, Dal, edible oil |
| 12% | Butter, ghee, packaged dry fruits |
| 18% | Soap, shampoo, toothpaste, detergent |
| 28% | Aerated drinks, tobacco |
### Regional Language Support
Invoice OCR handles mixed-language text including English, Tamil, Hindi, and Telugu β common in South Indian distributor invoices.
---
## Award Strategy
### OpenBMB Award
**How:** MiniCPM-V 4.6 is the primary vision model for OCR, cross-verification, and report generation. MiniCPM5-1B handles orchestration, normalization, and report text. Both are current OpenBMB models, making the product visibly built around the sponsor's ecosystem.
### OpenAI Track
**How:** The project is built with Codex as the primary coding agent, with Codex-authored commits and implementation traces included in the submission materials. The demo should explicitly show how Codex accelerated the build and helped produce the final Gradio app, making OpenAI's contribution load-bearing without adding a cloud API dependency.
### Modal Awards
**How:** Modal is used for the fine-tuning runs for MiniCPM-V 4.6, MiniCPM5-1B, and YOLO26n, with training logs, artifacts, and published Hugging Face model links included in the Field Notes post. Modal is not just incidental infrastructure; it is the training engine that makes the local-first app domain-specific.
### Best Agent Award
**How:** Six-agent pipeline with clear separation of concerns, visible agent trace logged to HuggingFace Hub. Not a single LLM call β genuine tool-using agent workflow.
### Well-Tuned Badge π―
**How:** Three fine-tuned models published on HuggingFace:
1. `minicpm-v-4-6-indian-invoice-extraction`
2. `yolo26n-indian-fmcg-detection`
3. `minicpm5-1b-indian-fmcg-normalizer`
### Off the Grid Badge π
**How:** MiniCPM-V 4.6 GGUF via llama.cpp + MiniCPM5-1B GGUF via llama.cpp + YOLO26n ONNX β entire pipeline runs locally, zero cloud API calls.
### Llama Champion Badge π¦
**How:** MiniCPM-V 4.6 and MiniCPM5-1B are served via llama.cpp using their GGUF quantized versions.
### Off-Brand Badge π¨
**How:** Custom Gradio UI β not default theme. Audit report card design with βΉ savings prominently displayed, colour-coded anomaly flags, and clean mobile-friendly layout.
### Sharing is Caring Badge π‘
**How:** Agent trace logged after each audit run and shared as a HuggingFace dataset artifact.
### Field Notes Badge π
**How:** Blog post: *"How I built an AI auditor for India's 12 million kirana stores"* β covering dataset creation, fine-tuning decisions, real-world testing with a store owner.
### Bonus Quest Champion
**How:** Stack the largest credible set of badges on one polished submission: Off the Grid, Well-Tuned, Off-Brand, Llama Champion, Sharing is Caring, and Field Notes.
### Tiny Titan
**How:** Total active model budget is approximately 2.38B parameters, comfortably below the <=4B Tiny Titan threshold while still handling OCR, agentic reasoning, normalization, and product counting.
### Best Demo
**How:** The video centers on one concrete, emotional story: a real kirana owner finds a rupee-denominated loss, sees the missing items visually highlighted, and gets a practical supplier action list. The demo should show the app working, the owner reaction, the agent trace, and the final savings number.
### Community Choice
**How:** Make the Space immediately understandable: upload sample invoice, upload sample delivery photos, run audit, see rupee savings. Pair the Space with a short social post using the India kirana angle and the "find where money is being lost" tagline.
### NVIDIA Nemotron Quest
**Decision:** Explicitly not targeted. Chasing Nemotron would force a major stack change and weaken the OpenBMB/local-first Tiny Titan story. The submission focuses on Backyard AI, OpenBMB, OpenAI, Modal, and the bonus badges instead.
---
## Gradio UI Design
### Screen 1 β Upload
```
βββββββββββββββββββββββββββββββββββββββββββ
β π KIRANA DETECTIVE β
β Your AI Business Auditor β
βββββββββββββββββββββββββββββββββββββββββββ€
β β
β [ π Upload Invoice ] β
β Photo / PDF / WhatsApp screenshot β
β β
β [ π· Upload Delivery Photos ] β
β Up to 5 photos of received goods β
β β
β Supplier Name: ___________________ β
β β
β [ π Run Audit ] β
β β
βββββββββββββββββββββββββββββββββββββββββββ
```
### Screen 2 β Results Dashboard
```
βββββββββββββββββββββββββββββββββββββββββββ
β AUDIT COMPLETE β HUL | INV-8821 β
βββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββ βββββββββββ βββββββββββ β
β β β Price β β β Short β β β Dupli β β
β β βΉ330 β β βΉ208 β β βΉ320 β β
β βββββββββββ βββββββββββ βββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββ β
β β π° TOTAL LEAKAGE: βΉ858 β β
β βββββββββββββββββββββββββββββββββββ β
β β
β [ π Full Report ] [ π€ Share ] β
β β
βββββββββββββββββββββββββββββββββββββββββββ
```
---
## Demo Story (For Submission Video)
> Ravi, a kirana store owner in Chennai, uploads one invoice from his HUL distributor and three photos of the goods delivered that morning.
>
> In 45 seconds, Kirana Detective finds:
> - Surf Excel is being charged 15.9% above the historical price
> - 4 Coke bottles are missing from the delivery
> - A Parle-G line item appears to be duplicated
>
> **Total leakage detected: βΉ858**
>
> Ravi calls his distributor. The credit note is issued the same day.
This outcome is **specific**, **measurable**, and **achievable in a real demo** β exactly what Backyard AI judges want to see.
---
## 10-Day Build Plan
| Day | Task | Model/Tool | Risk |
|---|---|---|---|
| 1 | Fine-tune YOLO26n on Roboflow Indian Grocery dataset | Modal GPU, Ultralytics | Low |
| 2 | Generate 500 synthetic Indian invoices; fine-tune MiniCPM-V 4.6 extraction | Modal + Unsloth | Medium |
| 3 | Fine-tune MiniCPM5-1B product normalizer; publish all 3 models to HF | Modal + Unsloth | Low |
| 4 | Build invoice OCR pipeline in Gradio: upload β MiniCPM-V β JSON | Python + Gradio | Medium |
| 5 | Build YOLO26n delivery counting pipeline: photo β count dict | ONNX Runtime | Medium |
| 6 | Build reconciliation agent + pricing anomaly detection | Rule-based Python | Low |
| 7 | Build custom Gradio dashboard UI with βΉ savings cards | Gradio + CSS | Low |
| 8 | Wire all agents together; implement trace logging; deploy to HF Space | LangGraph / custom | Medium |
| 9 | Test with real kirana owner; record demo video; capture Codex-authored commit/story proof | Codex + real user testing | Low |
| 10 | Write Field Notes blog; share agent trace; include Modal logs and final submission assets | HF Dataset + Modal logs | Low |
---
## Technical Stack Summary
| Component | Technology |
|---|---|
| Frontend | Gradio (custom theme, Off-Brand) |
| Hosting | Hugging Face Spaces |
| Primary VLM | MiniCPM-V 4.6 (GGUF via llama.cpp) |
| Agent Orchestrator | MiniCPM5-1B (GGUF via llama.cpp) |
| Counting Model | YOLO26n fine-tuned (ONNX, local) |
| Fine-tuning Platform | Modal + Unsloth (training engine for sponsor eligibility) |
| Build Agent | OpenAI Codex (commit author + build trace for OpenAI Track positioning) |
| Invoice parsing | PyMuPDF (PDF) + Gradio Image input |
| Data storage | Local JSON / SQLite (no cloud DB) |
| Agent tracing | Custom trace logger β HF Dataset |
---
## Risk Register
| Risk | Likelihood | Mitigation |
|---|---|---|
| MiniCPM-V GGUF has high latency on CPU | Medium | Use 4-bit quantized Q4_K_M; fall back to float16 on HF Space GPU |
| YOLO26n misses products not in Roboflow dataset | Medium | Limit demo to top 10 products; expand post-hackathon |
| Delivery photo quality too low for counting | High | Show demo with clean carton photos; add "photo quality tip" in UI |
| Fine-tuning time exceeds budget | Low | All 3 models trainable in < 6 hours total on Modal A10G |
| OpenAI Track story looks indirect | Medium | Make Codex visible in commit metadata, implementation trace, Field Notes, and demo narrative |
| Modal usage looks incidental | Low | Publish Modal training logs/artifacts and explicitly link fine-tuned models to Modal runs |
| Scope creep during build week | High | Freeze scope at Day 3; no new features after Day 6 |
---
*Kirana Detective AI β Build Small Hackathon 2026 β Track 1: Backyard AI*
|