Dukaan Saathi โ Receipt Parser (Llama-3.2-3B fine-tune)
Fine-tuned Llama-3.2-3B-Instruct for structured receipt parsing in Indian kirana (convenience) store workflows.
Part of the Dukaan Saathi inventory copilot demo.
What it does
Takes noisy supplier receipt OCR text and returns a structured JSON object with line items, quantities, prices, and supplier info. Designed for messy real-world receipts: handwritten bills, printed tax invoices, informal tally notes.
Training data
- 6 hand-authored examples from real kirana receipt formats
- 22 Modal LLM-generated synthetic examples augmenting edge cases
- Total: 28 examples; training focuses on format consistency over broad generalisation
Example
Input:
MAHALAKSHMI MARKETING
No. 2816 Date: 27/5/26
Parle 1 X 2450 = 2450
Bingo(C) 4 X 870 = 3480
Subtotal 5930 Discount 612 Total 6542
Output:
{
"supplier": "Mahalakshmi Marketing",
"invoice_no": "2816",
"date": "2026-05-27",
"items": [
{"product_raw": "Parle", "qty_cases": 1, "qty_units": 1, "unit_cost": 2450.0, "total": 2450.0},
{"product_raw": "Bingo(C)", "qty_cases": 4, "qty_units": 4, "unit_cost": 870.0, "total": 3480.0}
],
"subtotal": 5930.0,
"discount": 612.0,
"gst": 0.0,
"net_total": 6542.0
}
Inference
from huggingface_hub import InferenceClient
client = InferenceClient()
prompt = """### Instruction:
You are a receipt parser for an Indian convenience store. Extract all line items. Return ONLY valid JSON, no markdown.
### Input:
<paste receipt text here>
### Response:
"""
result = client.text_generation(prompt, model="summerdevlin46/dukaan-saathi-receipt-lora", max_new_tokens=768)
Limitations
- Small training set; overfits to known receipt styles (Mahalakshmi Marketing, Sri Venkateshwara Marketing, Brundavan Buns)
- Owner approval gate always required before any inventory write
- Not a general-purpose receipt parser
- Downloads last month
- 57
Model tree for summerdevlin46/dukaan-saathi-receipt-lora
Base model
meta-llama/Llama-3.2-3B-Instruct Quantized
unsloth/Llama-3.2-3B-Instruct-bnb-4bit