| # Kirana Detective β Build Progress |
|
|
| Deadline: **June 15, 2026** (5 days remaining as of June 10) |
|
|
| --- |
|
|
| ## Status Legend |
| - [x] Done |
| - [~] In progress / partial |
| - [ ] Not started |
| - [!] Blocked / needs action |
|
|
| --- |
|
|
| ## Pre-build: Fine-tune & Publish Models |
|
|
| | Task | Status | File | Notes | |
| |---|---|---|---| |
| | 0.1 Fine-tune YOLO26n | [ ] | `finetune/train_yolo26n.py` | Run `modal run finetune/train_yolo26n.py` | |
| | 0.2a Generate 500 synthetic invoices | [x] | `finetune/generate_invoices.py` | 500 images in 4 formats β pure Pillow, no native deps | |
| | 0.2b Fine-tune MiniCPM-V 4.6 | [ ] | `finetune/train_minicpm_v.py` | Run `modal run finetune/train_minicpm_v.py` after uploading invoices | |
| | 0.3 Fine-tune MiniCPM5-1B | [ ] | `finetune/train_minicpm5_1b.py` | Run `modal run finetune/train_minicpm5_1b.py` | |
|
|
| **Action needed**: Run all three modal jobs TODAY (June 10) β each takes 1-3h. |
|
|
| --- |
|
|
| ## Core Implementation |
|
|
| | Task | Status | File | Notes | |
| |---|---|---|---| |
| | 1. Project scaffolding | [x] | `requirements.txt`, `README.md`, dirs | Done | |
| | 2. Data models | [x] | `models.py` | All dataclasses + CANONICAL_AGENT_ORDER | |
| | 3.1 FMCG catalog JSON | [x] | `data/fmcg_catalog.json` | 200 SKUs, 7 categories | |
| | 3.2 FMCGCatalog class | [x] | `catalog.py` | Alias lookup, GST prefix match, singleton | |
| | 4. Storage layer | [x] | `storage.py` | SQLite + degraded mode + 90-day retention | |
| | 5. Agent Tracer | [x] | `tracer.py` | HF Hub publish with retry + daemon thread | |
| | 6. Agent 1: Invoice Extractor | [x] | `agents/invoice_extractor.py` | MiniCPM-V 4.6 | |
| | 7. Agent 2: Product Matcher | [x] | `agents/product_matcher.py` | MiniCPM5-1B | |
| | 8. Agent 3: Pricing Agent | [x] | `agents/pricing_agent.py` | Rule-based | |
| | 9. Agent 4: Visual Counter | [x] | `agents/visual_counter.py` | YOLO26n ONNX | |
| | 10. Agent 5: Reconciliation Agent | [x] | `agents/reconciliation_agent.py` | Rule-based | |
| | 11. Agent 6: Savings Agent | [x] | `agents/savings_agent.py` | MiniCPM5-1B | |
| | 12. Checkpoint (unit tests pass) | [ ] | `tests/` | | |
| | 13. Pipeline Orchestrator | [x] | `pipeline.py` | | |
| | 14. Backend entry point | [x] | `app.py` | Gradio gr.Server | |
| | 15. Custom frontend | [x] | `static/index.html` | Off-Brand badge | |
| | 16. Property-based tests | [ ] | `tests/test_properties.py` | Hypothesis | |
| | 17. Checkpoint (all tests pass) | [ ] | β | | |
| | 18. HF Space deployment | [ ] | `README.md` + `verify_models.py` | | |
|
|
| --- |
|
|
| ## Fine-tune Scripts Status |
|
|
| | Script | Modal Secret | Ready to run? | |
| |---|---|---| |
| | `train_yolo26n.py` | `roboflow-secret`, `hf-secret` | Yes β modal secrets set | |
| | `generate_invoices.py` | None (local) | Needs `pip install weasyprint augraphy` | |
| | `train_minicpm_v.py` | `hf-secret` | After generate_invoices.py | |
| | `train_minicpm5_1b.py` | `hf-secret` | After catalog is on Modal volume | |
| |
| --- |
| |
| ## Modal Setup |
| |
| ```bash |
| # Secrets (already done): |
| modal secret create roboflow-secret ROBOFLOW_API_KEY=<key> |
| modal secret create hf-secret HF_TOKEN=<token> |
|
|
| # Upload catalog to Modal volume (needed for train_minicpm5_1b): |
| modal volume put kirana-synth-data data/fmcg_catalog.json fmcg_catalog.json |
|
|
| # Run fine-tune jobs (run all 3 in parallel today): |
| modal run finetune/train_yolo26n.py |
| modal run finetune/generate_invoices.py # local, not modal |
| modal run finetune/train_minicpm_v.py # after invoices are generated |
| modal run finetune/train_minicpm5_1b.py |
| ``` |
| |
| --- |
| |
| ## HF Repos to Create |
| |
| After fine-tuning publishes, verify these exist: |
| - `build-small-hackathon/yolo26n-indian-fmcg-detection` |
| - `build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction` |
| - `build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer` |
| - `build-small-hackathon/kirana-detective-traces` (dataset β create manually before first audit run) |
| |
| --- |
| |
| ## Badge Checklist |
| |
| | Badge | Requirement | Status | |
| |---|---|---| |
| | Off the Grid | Zero cloud API calls in inference | [ ] | |
| | Well-Tuned | 3 fine-tuned models on HF Hub | [ ] | |
| | Off-Brand | Custom gr.Server frontend | [ ] | |
| | Llama Champion | MiniCPM models via llama.cpp | [ ] | |
| | Sharing is Caring | Agent trace to HF Dataset after each audit | [ ] | |
| | Field Notes | Blog post | [ ] | |
| |
| --- |
| |
| ## Day-by-Day Remaining Plan |
| |
| | Day | Date | Focus | |
| |---|---|---| |
| | Day 6 | June 10 | Kick off Modal fine-tune jobs + implement models.py, catalog.py | |
| | Day 7 | June 11 | storage.py, tracer.py, agents 1-3 | |
| | Day 8 | June 12 | agents 4-6, pipeline.py | |
| | Day 9 | June 13 | app.py, static/index.html, full pipeline test | |
| | Day 10 | June 14 | Tests, demo video, blog post, HF Space deploy | |
| | Deadline | June 15 | Submit | |
| |