--- language: en license: apache-2.0 tags: - document-ai - layoutlmv3 - token-classification - receipt-extraction - invoice-extraction - base-model datasets: - custom metrics: - f1 - precision - recall --- # layoutlmv3-receipt-invoice LayoutLMv3 model initialized for receipt and invoice field extraction. ## Model Status ⚠️ **This is an initialized base model** - not yet fine-tuned on custom data. - **Base Model**: `microsoft/layoutlmv3-base` - **Status**: Ready for deployment and fine-tuning - **Custom Labels**: Configured for receipt/invoice field extraction ## Intended Use This model is configured to extract the following fields from receipts and invoices: ### Supported Fields [ "O", "B-MerchantName", "I-MerchantName", "B-MerchantAddress", "I-MerchantAddress", "B-TransactionDate", "I-TransactionDate", "B-Currency", "I-Currency", "B-Total", "I-Total", "B-TotalTax", "I-TotalTax", "B-InvoiceNumber", "I-InvoiceNumber", "B-Subtotal", "I-Subtotal", "B-LineItems", "I-LineItems" ] ## Training Status This repository contains: - ✅ Base LayoutLMv3 architecture - ✅ Custom label configuration for receipts/invoices - ⏳ **Not yet fine-tuned** - using pre-trained weights from `microsoft/layoutlmv3-base` ### Training the Model To fine-tune this model on your custom data: ```bash # On RunPod GPU pod or local machine with GPU python main.py --mode train --push-to-hub --version v1.0 ``` This will: 1. Train on your labeled receipt/invoice data 2. Update this repository with fine-tuned weights 3. Tag the trained version (e.g., v1.0, v1.1, etc.) ## Usage ### Local Inference ```python from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor from PIL import Image # Load model and processor model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt") processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False) # Prepare inputs (you need OCR results: words and bounding boxes) image = Image.open("receipt.jpg").convert("RGB") words = ["STORE", "NAME", "Total:", "$10.99"] boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]] # Normalize boxes to 0-1000 range width, height = image.size normalized_boxes = [[int(1000*x0/width), int(1000*y0/height), int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes] encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt") outputs = model(**encoding) predictions = outputs.logits.argmax(-1) ``` ### RunPod Serverless Deployment This model is designed for deployment on RunPod Serverless: 1. **Build and push Docker image:** ```bash cd deployment/runpod/LayoutLMv3 python deploy.py --action deploy ``` 2. **Create RunPod endpoint:** - Docker Image: `registry.hf.space/your-username/layoutlmv3-inference:latest` - Environment Variables: - `HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt` - `HF_TOKEN=` - `MODEL_VERSION=main` (or specific version tag after training) ## Model Architecture - **Base**: microsoft/layoutlmv3-base - **Task**: Token Classification - **Input**: Image + Words + Bounding Boxes - **Output**: Field labels (IOB tagging scheme) - **Number of Labels**: 19 ## Label Schema The model uses IOB (Inside-Outside-Beginning) tagging: - **O**: Outside any field - **B-FieldName**: Beginning of a field - **I-FieldName**: Inside/continuation of a field ### Example ``` Text: ["Total:", "$", "10", ".", "99"] Labels: ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"] Extracted: Total: "$ 10 . 99" ``` ## Version History | Version | Date | Description | Status | |---------|------|-------------|--------| | main | 2025-11-13 | Initialized with base model + custom labels | Base (not trained) | After training, versions will be tagged (v1.0, v1.1, etc.). ## Training Configuration When training is performed, the following configuration will be used: ```python { "model_name": "microsoft/layoutlmv3-base", "learning_rate": 5e-05, "batch_size": 4, "num_epochs": 20, "warmup_steps": 500, "max_length": 512, "validation_split": 0.2, "random_seed": 42, "gradient_accumulation_steps": 2, "eval_steps": 100, "save_steps": 500, "logging_steps": 50 } ``` ## Citation ```bibtex @misc{layoutlmv3-receipt-invoice, author = {MK Digital GmbH}, title = {LayoutLMv3 Receipt/Invoice Field Extraction}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}} } @article{huang2022layoutlmv3, title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking}, author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu}, journal={arXiv preprint arXiv:2204.08387}, year={2022} } ``` ## License Apache 2.0 ## Contact For questions or issues, please open an issue in the repository.