muhammed-afsal-p-m's picture
Update README.md
74d4944 verified
|
Raw
History Blame Contribute Delete
2.58 kB
---
library_name: transformers
tags:
- llama
- invoice-extraction
- sft
- gguf
---
# Llama-base-3.1-8B-invoice-gguf-sft
A fine-tuned Llama-3.1-8B model optimized for **invoice understanding and extraction**.
This version is exported in **GGUF** format for performant inference with tools such as **llama.cpp**, **Ollama**, and **text-generation-ui**.
---
## Model Details
### Model Description
This model adapts Llama-3.1-8B for structured invoice field extraction.
The goal is to support tasks such as reading invoice text and identifying key fields (amount, date, vendor, tax, line items, etc.).
- **Developed by:** *muhammed-afsal-p-m*
- **Model type:** Auto-regressive language model (decoder-only)
- **Languages:** English (primary) — Other languages not verified
- **License:** *Fill in — e.g., MIT, Apache-2.0, others*
- **Fine-tuned from:** Llama-3.1-8B (Meta)
### Model Sources
- **Repository:** https://huggingface.co/muhammed-afsal-p-m/Llama-base-3.1-8B-invoice-gguf-sft
---
## Uses
### Direct Use
Useful for:
- Invoice text understanding
- Extracting structured fields
- Document parsing prototypes
- Local inference via GGUF
### Downstream Use
Can be integrated into:
- RPA invoice pipelines
- Accounting automation
- OCR → LLM extraction stages
- Document indexing/search systems
### Out-of-Scope Use
Not suited for:
- Legal/financial decision-making without human review
- High-stakes extraction requiring guaranteed accuracy
- Multi-language invoice parsing (not validated)
- Vision-based tasks (requires text extracted separately)
---
## Bias, Risks, and Limitations
- Model accuracy depends heavily on the **quality and consistency** of invoice text.
- May hallucinate missing fields instead of explicitly stating absence.
- Invoices vary widely in structure; unseen formats may reduce reliability.
- Any training biases (invoice styles, languages, domain distribution) affect output.
### Recommendations
- Always verify extracted results.
- Use deterministic decoding when consistent outputs are required.
- Validate outputs with rule-based post-processing.
---
## How to Get Started
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "muhammed-afsal-p-m/Llama-base-3.1-8B-invoice-gguf-sft"
# For GGUF, use llama.cpp / ctransformers:
from ctransformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
model_name,
model_file="model.gguf", # replace with your file name
)
print(model("Extract invoice total from: ..."))