naazimsnh02's picture
Fix documentation
3b757a5
|
Raw
History Blame Contribute Delete
6.38 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Model Fine-tuning Guide

Fine-tune Kirana Detective's three models on Indian FMCG invoice data.

Quick Start (TL;DR)

export ROBOFLOW_API_KEY=<your-key>
export HF_TOKEN=<your-token>
modal run finetune/generate_invoices.py     # 10 min
modal run finetune/train_minicpm_v.py       # 2 hours
modal run finetune/train_minicpm5_1b.py     # 1 hour
modal run finetune/train_yolo26n.py         # 2 hours

Models auto-publish to HuggingFace Hub on completion.


Three Models, Three Pipelines

1. MiniCPM-V 4.6 (Invoice OCR) — train_minicpm_v.py

Purpose: Extract line items, amounts, GST from invoice images (printed PDFs, handwritten, WhatsApp screenshots)

Input: 500 synthetic invoices (4 formats)
Method: QLoRA fine-tuning via PEFT + bitsandbytes (Unsloth incompatible with MiniCPM-V-4.6)
Output: LoRA adapter → merged HF weights (bfloat16). GGUF conversion is a separate manual step via gguf-my-repo Space.
Hardware: A10G, 22 GB VRAM, ~52 min (actual)

Datasets used:

  • Synthetic invoices generated by generate_invoices.py
  • Splits: train/val/test = 400/50/50
  • Formats: pure Pillow (no native deps) — GST, Tally PDF, handwritten, WhatsApp

2. MiniCPM5-1B (Product Name Normalizer) — train_minicpm5_1b.py

Purpose: Map invoice abbreviations (e.g., "MAGGI NDL 70GM") to canonical names

Input: 2,000 synthetic (raw, canonical) pairs
Method: QLoRA, 4-bit base + LoRA adapters
Output: GGUF quantized model
Hardware: A10G, ~1 hour

Dataset generation:

  • Hand-curated 200 SKU catalog
  • Rule-based augmentation: abbreviation expansion, typo injection, truncation
  • Coverage: 10 major Indian FMCG suppliers

3. YOLO26n (Product Detection) — train_yolo26n.py

Purpose: Count packaged products in shelf/counter photos

Input: 3 Roboflow datasets merged (11,000+ images)
Method: Ultralytics standard training pipeline
Output: ONNX format for CPU/GPU inference
Hardware: A10G, ~2 hours

Datasets merged:

  1. agentsk47/indian-grocery-object-detection v1
  2. iit-patna/grocery_items v45 (6,695 images)
  3. project-c5ho0/indian-market v2 (4,694 images)

Prerequisites

# 1. Clone this repo
git clone https://github.com/naazimsnh02/kirana-detective.git
cd kirana-detective

# 2. Install local deps (for generated synthetics preview only)
pip install -r requirements.txt

# 3. Set up secrets for Modal/HF
modal token new
export ROBOFLOW_API_KEY=<from Roboflow universe account>
export HF_TOKEN=<from huggingface.co/settings/tokens>

# 4. Test Modal setup
modal run finetune/generate_invoices.py

Reproducibility Checklist

  • Dataset versioning: All Roboflow versions pinned (v1, v45, v2)
  • Seed control: Random seeds fixed in all training scripts
  • Output validation: Run tests/ after each model completes
  • HF Hub publish logs: Check model card auto-generated from training
  • GGUF quantization: Verified mAP/F1 vs. float32 baseline

Known Limitations & Biases

Model Limitation Impact Mitigation
MiniCPM-V Only 10 FMCG suppliers in training data Fails on uncommon brands Add more invoices post-hackathon
MiniCPM5-1B Synthetic data only (no real invoice typos) Overfits to rule-based augmentation Collect 200+ real examples next
YOLO26n Merged dataset skewed toward beauty/personal care (Tresemmé, Nivea, Patanjali) May underperform on grocery staples Balance class distribution across grocery categories

Troubleshooting

"Modal timeout after 2 hours?"
→ YOLO training can take 2–3h depending on GPU queue. Increase timeout in modal.json.

"GGUF quantization fails?"
→ Ensure llama.cpp is compiled with CUDA support if GPU quantization intended.

"HF Hub publish returns 403?"
HF_TOKEN must have write access. Regenerate at huggingface.co/settings/tokens.


Output Files

Training scripts publish initially to the personal naazimsnh02/ namespace; models are then manually transferred to the build-small-hackathon/ org for the hackathon submission.

After training runs, check HF Hub (naazimsnh02/):

  • MiniCPM-V LoRA adapter: naazimsnh02/minicpm-v-4-6-indian-invoice-extraction

    • LoRA adapter files (adapter_config.json, adapter_model.safetensors, etc.)
    • mmproj.gguf (vision encoder, uploaded separately via export_minicpm_v_gguf.py)
  • MiniCPM-V merged weights: naazimsnh02/minicpm-v-4-6-indian-invoice-extraction-merged

    • Full merged bfloat16 weights (no PEFT required at inference)
    • Run modal run finetune/export_minicpm_v_gguf.py after training to create this repo
  • MiniCPM5-1B: naazimsnh02/minicpm5-1b-indian-fmcg-normalizer

    • model.gguf (Q4_K_M, ~1.2 GB)
  • YOLO26n: naazimsnh02/yolo26n-indian-fmcg-detection

    • yolo26n_fmcg.onnx (~15 MB, opset 12)
    • best.pt (PyTorch checkpoint)
    • class_names.json (1,831 unified classes from merged dataset)

Hackathon / production repos (after manual transfer):

  • build-small-hackathon/minicpm-v-4-6-indian-invoice-extraction-merged
  • build-small-hackathon/minicpm5-1b-indian-fmcg-normalizer
  • build-small-hackathon/yolo26n-indian-fmcg-detection
  • build-small-hackathon/kirana-invoice-train-data (HF dataset)

Sharing is Caring — trace datasets:

# Upload Claude Code build sessions (run once after project is complete)
export HF_TOKEN=<your-token>
python finetune/upload_build_traces.py
# → publishes to build-small-hackathon/kirana-detective-build-traces
# → viewable in HF Data Studio native trace viewer

# Runtime audit traces are auto-published by tracer.py during app use
# → build-small-hackathon/kirana-detective-traces

Next Steps Post-Hackathon

  1. Collect real invoice data from partnered kirana stores (500 minimum)
  2. Expand product taxonomy (currently 200 SKUs → 2000)
  3. Add regional variants (Hindi/Tamil/Malayalam abbreviations)
  4. Benchmark inference latency on Raspberry Pi / Android devices