--- license: apache-2.0 tags: - layout-detection - document-ai - rt-detr - gguf - crispembed pipeline_tag: object-detection --- # RT-DETRv2 Document Layout Detection (GGUF) Document layout analysis model for CrispEmbed. Detects 17 region types in document images. ## Architecture - **Backbone**: ResNet-50-D (BN-folded Conv2d) - **Encoder**: HybridEncoder (AIFI self-attention + FPN/PAN with CSP-RepVGG) - **Decoder**: 6-layer transformer with deformable multi-scale cross-attention (300 queries) - **Classes**: 17 (text, title, table, figure, formula, caption, section_header, list_item, footnote, page_header, page_footer, code, document_index, checkbox_selected, checkbox_unselected, form, key_value_region) - **Parameters**: 42M - **Source**: [docling-project/docling-layout-heron](https://huggingface.co/docling-project/docling-layout-heron) (Apache-2.0) ## Variants | File | Size | Format | Notes | |------|------|--------|-------| | layout-heron-f32.gguf | 161 MB | F32 | Full precision, development | | layout-heron-q8_0.gguf | 43 MB | Q8_0 | Recommended for inference | ## Usage ```bash # CLI ./build/crispembed -m layout-heron --layout document.png --json # Server ./build/crispembed-server --layout layout-heron-q8_0.gguf curl -X POST http://localhost:8080/layout/detect -d '{"image": "page.png"}' ``` ```python from crispembed import CrispLayout layout = CrispLayout("layout-heron-q8_0.gguf") regions = layout.detect("document.png") ``` ## Parity - Encoder: all stages cos=1.0 vs HF reference (with exact input) - Detection score: 0.934 (HF reference: 0.955) - 14 parity bugs found and fixed via systematic layer-by-layer diff ## License Apache-2.0 (same as upstream docling-layout-heron).