# Model Fine-tuning Guide Fine-tune Kirana Detective's three models on Indian FMCG invoice data. ## Quick Start (TL;DR) ```bash export ROBOFLOW_API_KEY= export HF_TOKEN= modal run finetune/generate_invoices.py # 10 min modal run finetune/train_minicpm_v.py # 2 hours modal run finetune/train_minicpm5_1b.py # 1 hour modal run finetune/train_yolo26n.py # 2 hours ``` Models auto-publish to HuggingFace Hub on completion. --- ## Three Models, Three Pipelines ### 1. MiniCPM-V 4.6 (Invoice OCR) — `train_minicpm_v.py` **Purpose**: Extract line items, amounts, GST from invoice images (printed PDFs, handwritten, WhatsApp screenshots) **Input**: 500 synthetic invoices (4 formats) **Method**: QLoRA fine-tuning with Unsloth **Output**: GGUF quantized model → HF Hub **Hardware**: A10G, 22 GB VRAM, ~2 hours **Datasets used**: - Synthetic invoices generated by `generate_invoices.py` - Splits: train/val/test = 400/50/50 - Formats: pure Pillow (no native deps) — GST, Tally PDF, handwritten, WhatsApp --- ### 2. MiniCPM5-1B (Product Name Normalizer) — `train_minicpm5_1b.py` **Purpose**: Map invoice abbreviations (e.g., "MAGGI NDL 70GM") to canonical names **Input**: 2,000 synthetic (raw, canonical) pairs **Method**: QLoRA, 4-bit base + LoRA adapters **Output**: GGUF quantized model **Hardware**: A10G, ~1 hour **Dataset generation**: - Hand-curated 200 SKU catalog - Rule-based augmentation: abbreviation expansion, typo injection, truncation - Coverage: 10 major Indian FMCG suppliers --- ### 3. YOLO26n (Product Detection) — `train_yolo26n.py` **Purpose**: Count packaged products in shelf/counter photos **Input**: 3 Roboflow datasets merged (11,000+ images) **Method**: Ultralytics standard training pipeline **Output**: ONNX format for CPU/GPU inference **Hardware**: A10G, ~2 hours **Datasets merged**: 1. [agentsk47/indian-grocery-object-detection](https://universe.roboflow.com/agentsk47/indian-grocery-object-detection-mfsnx) v1 2. [iit-patna/grocery_items](https://universe.roboflow.com/iit-patna-qg1jh/grocery_items-7i2em) v45 (6,695 images) 3. [project-c5ho0/indian-market](https://universe.roboflow.com/project-c5ho0/indian-market-qieug) v2 (4,694 images) --- ## Prerequisites ```bash # 1. Clone this repo git clone https://github.com/build-small-hackathon/kirana-invoice-train-data.git cd kirana-invoice-train-data # 2. Install local deps (for generated synthetics preview only) pip install -r requirements.txt # 3. Set up secrets for Modal/HF modal token new export ROBOFLOW_API_KEY= export HF_TOKEN= # 4. Test Modal setup modal run finetune/generate_invoices.py ``` --- ## Reproducibility Checklist - [ ] **Dataset versioning**: All Roboflow versions pinned (v1, v45, v2) - [ ] **Seed control**: Random seeds fixed in all training scripts - [ ] **Output validation**: Run `tests/` after each model completes - [ ] **HF Hub publish logs**: Check model card auto-generated from training - [ ] **GGUF quantization**: Verified mAP/F1 vs. float32 baseline --- ## Known Limitations & Biases | Model | Limitation | Impact | Mitigation | |---|---|---|---| | MiniCPM-V | Only 10 FMCG suppliers in training data | Fails on uncommon brands | Add more invoices post-hackathon | | MiniCPM5-1B | Synthetic data only (no real invoice typos) | Overfits to rule-based augmentation | Collect 200+ real examples next | | YOLO26n | Merged dataset skewed toward beauty/personal care (Tresemmé, Nivea, Patanjali) | May underperform on grocery staples | Balance class distribution across grocery categories | --- ## Troubleshooting **"Modal timeout after 2 hours?"** → YOLO training can take 2–3h depending on GPU queue. Increase timeout in `modal.json`. **"GGUF quantization fails?"** → Ensure llama.cpp is compiled with CUDA support if GPU quantization intended. **"HF Hub publish returns 403?"** → `HF_TOKEN` must have write access. Regenerate at huggingface.co/settings/tokens. --- ## Output Files After successful runs, check HF Hub: - **MiniCPM-V**: `build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction` - `model.gguf` (4.5 GB) - `model_card.md` - **MiniCPM5-1B**: `build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer` - `model.gguf` (1.2 GB) - `model_card.md` - **YOLO26n**: `build-small-hackathon/yolo26n-indian-fmcg-detection` - `best.onnx` (15 MB) - `class_names.json` - `model_card.md` --- ## Next Steps Post-Hackathon 1. **Collect real invoice data** from partnered kirana stores (500 minimum) 2. **Expand product taxonomy** (currently 200 SKUs → 2000) 3. **Add regional variants** (Hindi/Tamil/Malayalam abbreviations) 4. **Benchmark inference latency** on Raspberry Pi / Android devices