---
language: en
license: apache-2.0
tags:
  - document-ai
  - layoutlmv3
  - token-classification
  - receipt-extraction
  - invoice-extraction
  - base-model
datasets:
  - custom
metrics:
  - f1
  - precision
  - recall
---

# layoutlmv3-receipt-invoice

LayoutLMv3 model initialized for receipt and invoice field extraction.

## Model Status

⚠️ **This is an initialized base model** - not yet fine-tuned on custom data.

- **Base Model**: `microsoft/layoutlmv3-base`
- **Status**: Ready for deployment and fine-tuning
- **Custom Labels**: Configured for receipt/invoice field extraction

## Intended Use

This model is configured to extract the following fields from receipts and invoices:

### Supported Fields

[
  "O",
  "B-MerchantName",
  "I-MerchantName",
  "B-MerchantAddress",
  "I-MerchantAddress",
  "B-TransactionDate",
  "I-TransactionDate",
  "B-Currency",
  "I-Currency",
  "B-Total",
  "I-Total",
  "B-TotalTax",
  "I-TotalTax",
  "B-InvoiceNumber",
  "I-InvoiceNumber",
  "B-Subtotal",
  "I-Subtotal",
  "B-LineItems",
  "I-LineItems"
]

## Training Status

This repository contains:
- ✅ Base LayoutLMv3 architecture
- ✅ Custom label configuration for receipts/invoices
- ⏳ **Not yet fine-tuned** - using pre-trained weights from `microsoft/layoutlmv3-base`

### Training the Model

To fine-tune this model on your custom data:

```bash
# On RunPod GPU pod or local machine with GPU
python main.py --mode train --push-to-hub --version v1.0
```

This will:
1. Train on your labeled receipt/invoice data
2. Update this repository with fine-tuned weights
3. Tag the trained version (e.g., v1.0, v1.1, etc.)

## Usage

### Local Inference

```python
from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor
from PIL import Image

# Load model and processor
model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt")
processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False)

# Prepare inputs (you need OCR results: words and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Total:", "$10.99"]
boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]]

# Normalize boxes to 0-1000 range
width, height = image.size
normalized_boxes = [[int(1000*x0/width), int(1000*y0/height),
                      int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes]

encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt")
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)
```

### RunPod Serverless Deployment

This model is designed for deployment on RunPod Serverless:

1. **Build and push Docker image:**
   ```bash
   cd deployment/runpod/LayoutLMv3
   python deploy.py --action deploy
   ```

2. **Create RunPod endpoint:**
   - Docker Image: `registry.hf.space/your-username/layoutlmv3-inference:latest`
   - Environment Variables:
     - `HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt`
     - `HF_TOKEN=<your-token>`
     - `MODEL_VERSION=main` (or specific version tag after training)

## Model Architecture

- **Base**: microsoft/layoutlmv3-base
- **Task**: Token Classification
- **Input**: Image + Words + Bounding Boxes
- **Output**: Field labels (IOB tagging scheme)
- **Number of Labels**: 19

## Label Schema

The model uses IOB (Inside-Outside-Beginning) tagging:

- **O**: Outside any field
- **B-FieldName**: Beginning of a field
- **I-FieldName**: Inside/continuation of a field

### Example

```
Text:        ["Total:", "$", "10", ".", "99"]
Labels:      ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"]
Extracted:   Total: "$ 10 . 99"
```

## Version History

| Version | Date | Description | Status |
|---------|------|-------------|--------|
| main | 2025-11-13 | Initialized with base model + custom labels | Base (not trained) |

After training, versions will be tagged (v1.0, v1.1, etc.).

## Training Configuration

When training is performed, the following configuration will be used:

```python
{
  "model_name": "microsoft/layoutlmv3-base",
  "learning_rate": 5e-05,
  "batch_size": 4,
  "num_epochs": 20,
  "warmup_steps": 500,
  "max_length": 512,
  "validation_split": 0.2,
  "random_seed": 42,
  "gradient_accumulation_steps": 2,
  "eval_steps": 100,
  "save_steps": 500,
  "logging_steps": 50
}
```

## Citation

```bibtex
@misc{layoutlmv3-receipt-invoice,
  author = {MK Digital GmbH},
  title = {LayoutLMv3 Receipt/Invoice Field Extraction},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}}
}

@article{huang2022layoutlmv3,
  title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
  author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu},
  journal={arXiv preprint arXiv:2204.08387},
  year={2022}
}
```

## License

Apache 2.0

## Contact

For questions or issues, please open an issue in the repository.