| # Model Fine-tuning Guide |
|
|
| Fine-tune Kirana Detective's three models on Indian FMCG invoice data. |
|
|
| ## Quick Start (TL;DR) |
|
|
| ```bash |
| export ROBOFLOW_API_KEY=<your-key> |
| export HF_TOKEN=<your-token> |
| modal run finetune/generate_invoices.py # 10 min |
| modal run finetune/train_minicpm_v.py # 2 hours |
| modal run finetune/train_minicpm5_1b.py # 1 hour |
| modal run finetune/train_yolo26n.py # 2 hours |
| ``` |
|
|
| Models auto-publish to HuggingFace Hub on completion. |
|
|
| --- |
|
|
| ## Three Models, Three Pipelines |
|
|
| ### 1. MiniCPM-V 4.6 (Invoice OCR) — `train_minicpm_v.py` |
|
|
| **Purpose**: Extract line items, amounts, GST from invoice images (printed PDFs, handwritten, WhatsApp screenshots) |
|
|
| **Input**: 500 synthetic invoices (4 formats) |
| **Method**: QLoRA fine-tuning via PEFT + bitsandbytes (Unsloth incompatible with MiniCPM-V-4.6) |
| **Output**: LoRA adapter → merged HF weights (bfloat16). GGUF conversion is a separate manual step via [gguf-my-repo Space](https://huggingface.co/spaces/ggml-org/gguf-my-repo). |
| **Hardware**: A10G, 22 GB VRAM, ~52 min (actual) |
|
|
| **Datasets used**: |
| - Synthetic invoices generated by `generate_invoices.py` |
| - Splits: train/val/test = 400/50/50 |
| - Formats: pure Pillow (no native deps) — GST, Tally PDF, handwritten, WhatsApp |
|
|
| --- |
|
|
| ### 2. MiniCPM5-1B (Product Name Normalizer) — `train_minicpm5_1b.py` |
|
|
| **Purpose**: Map invoice abbreviations (e.g., "MAGGI NDL 70GM") to canonical names |
|
|
| **Input**: 2,000 synthetic (raw, canonical) pairs |
| **Method**: QLoRA, 4-bit base + LoRA adapters |
| **Output**: GGUF quantized model |
| **Hardware**: A10G, ~1 hour |
|
|
| **Dataset generation**: |
| - Hand-curated 200 SKU catalog |
| - Rule-based augmentation: abbreviation expansion, typo injection, truncation |
| - Coverage: 10 major Indian FMCG suppliers |
|
|
| --- |
|
|
| ### 3. YOLO26n (Product Detection) — `train_yolo26n.py` |
| |
| **Purpose**: Count packaged products in shelf/counter photos |
| |
| **Input**: 3 Roboflow datasets merged (11,000+ images) |
| **Method**: Ultralytics standard training pipeline |
| **Output**: ONNX format for CPU/GPU inference |
| **Hardware**: A10G, ~2 hours |
| |
| **Datasets merged**: |
| 1. [agentsk47/indian-grocery-object-detection](https://universe.roboflow.com/agentsk47/indian-grocery-object-detection-mfsnx) v1 |
| 2. [iit-patna/grocery_items](https://universe.roboflow.com/iit-patna-qg1jh/grocery_items-7i2em) v45 (6,695 images) |
| 3. [project-c5ho0/indian-market](https://universe.roboflow.com/project-c5ho0/indian-market-qieug) v2 (4,694 images) |
| |
| --- |
| |
| ## Prerequisites |
| |
| ```bash |
| # 1. Clone this repo |
| git clone https://github.com/naazimsnh02/kirana-detective.git |
| cd kirana-detective |
| |
| # 2. Install local deps (for generated synthetics preview only) |
| pip install -r requirements.txt |
| |
| # 3. Set up secrets for Modal/HF |
| modal token new |
| export ROBOFLOW_API_KEY=<from Roboflow universe account> |
| export HF_TOKEN=<from huggingface.co/settings/tokens> |
|
|
| # 4. Test Modal setup |
| modal run finetune/generate_invoices.py |
| ``` |
| |
| --- |
| |
| ## Reproducibility Checklist |
| |
| - [ ] **Dataset versioning**: All Roboflow versions pinned (v1, v45, v2) |
| - [ ] **Seed control**: Random seeds fixed in all training scripts |
| - [ ] **Output validation**: Run `tests/` after each model completes |
| - [ ] **HF Hub publish logs**: Check model card auto-generated from training |
| - [ ] **GGUF quantization**: Verified mAP/F1 vs. float32 baseline |
| |
| --- |
| |
| ## Known Limitations & Biases |
| |
| | Model | Limitation | Impact | Mitigation | |
| |---|---|---|---| |
| | MiniCPM-V | Only 10 FMCG suppliers in training data | Fails on uncommon brands | Add more invoices post-hackathon | |
| | MiniCPM5-1B | Synthetic data only (no real invoice typos) | Overfits to rule-based augmentation | Collect 200+ real examples next | |
| | YOLO26n | Merged dataset skewed toward beauty/personal care (Tresemmé, Nivea, Patanjali) | May underperform on grocery staples | Balance class distribution across grocery categories | |
| |
| --- |
| |
| ## Troubleshooting |
| |
| **"Modal timeout after 2 hours?"** |
| → YOLO training can take 2–3h depending on GPU queue. Increase timeout in `modal.json`. |
| |
| **"GGUF quantization fails?"** |
| → Ensure llama.cpp is compiled with CUDA support if GPU quantization intended. |
| |
| **"HF Hub publish returns 403?"** |
| → `HF_TOKEN` must have write access. Regenerate at huggingface.co/settings/tokens. |
|
|
| --- |
|
|
| ## Output Files |
|
|
| Training scripts publish initially to the personal `naazimsnh02/` namespace; models are then |
| manually transferred to the `build-small-hackathon/` org for the hackathon submission. |
|
|
| **After training runs, check HF Hub (`naazimsnh02/`):** |
|
|
| - **MiniCPM-V LoRA adapter**: `naazimsnh02/minicpm-v-4-6-indian-invoice-extraction` |
| - LoRA adapter files (`adapter_config.json`, `adapter_model.safetensors`, etc.) |
| - `mmproj.gguf` (vision encoder, uploaded separately via `export_minicpm_v_gguf.py`) |
|
|
| - **MiniCPM-V merged weights**: `naazimsnh02/minicpm-v-4-6-indian-invoice-extraction-merged` |
| - Full merged bfloat16 weights (no PEFT required at inference) |
| - Run `modal run finetune/export_minicpm_v_gguf.py` after training to create this repo |
|
|
| - **MiniCPM5-1B**: `naazimsnh02/minicpm5-1b-indian-fmcg-normalizer` |
| - `model.gguf` (Q4_K_M, ~1.2 GB) |
|
|
| - **YOLO26n**: `naazimsnh02/yolo26n-indian-fmcg-detection` |
| - `yolo26n_fmcg.onnx` (~15 MB, opset 12) |
| - `best.pt` (PyTorch checkpoint) |
| - `class_names.json` (1,831 unified classes from merged dataset) |
|
|
| **Hackathon / production repos (after manual transfer):** |
|
|
| - `build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged` |
| - `build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer` |
| - `build-small-hackathon/yolo26n-indian-fmcg-detection` |
| - `build-small-hackathon/kirana-invoice-train-data` (HF dataset) |
|
|
| **Sharing is Caring — trace datasets:** |
|
|
| ```bash |
| # Upload Claude Code build sessions (run once after project is complete) |
| export HF_TOKEN=<your-token> |
| python finetune/upload_build_traces.py |
| # → publishes to build-small-hackathon/kirana-detective-build-traces |
| # → viewable in HF Data Studio native trace viewer |
| |
| # Runtime audit traces are auto-published by tracer.py during app use |
| # → build-small-hackathon/kirana-detective-traces |
| ``` |
|
|
| --- |
|
|
| ## Next Steps Post-Hackathon |
|
|
| 1. **Collect real invoice data** from partnered kirana stores (500 minimum) |
| 2. **Expand product taxonomy** (currently 200 SKUs → 2000) |
| 3. **Add regional variants** (Hindi/Tamil/Malayalam abbreviations) |
| 4. **Benchmark inference latency** on Raspberry Pi / Android devices |
|
|