---
license: apache-2.0
tags:
- layout-detection
- document-ai
- rt-detr
- gguf
- crispembed
pipeline_tag: object-detection
---

# RT-DETRv2 Document Layout Detection (GGUF)

Document layout analysis model for CrispEmbed. Detects 17 region types in document images.

## Architecture

- **Backbone**: ResNet-50-D (BN-folded Conv2d)
- **Encoder**: HybridEncoder (AIFI self-attention + FPN/PAN with CSP-RepVGG)
- **Decoder**: 6-layer transformer with deformable multi-scale cross-attention (300 queries)
- **Classes**: 17 (text, title, table, figure, formula, caption, section_header, list_item, footnote, page_header, page_footer, code, document_index, checkbox_selected, checkbox_unselected, form, key_value_region)
- **Parameters**: 42M
- **Source**: [docling-project/docling-layout-heron](https://huggingface.co/docling-project/docling-layout-heron) (Apache-2.0)

## Variants

| File | Size | Format | Notes |
|------|------|--------|-------|
| layout-heron-f32.gguf | 161 MB | F32 | Full precision, development |
| layout-heron-q8_0.gguf | 43 MB | Q8_0 | Recommended for inference |

## Usage

```bash
# CLI
./build/crispembed -m layout-heron --layout document.png --json

# Server
./build/crispembed-server --layout layout-heron-q8_0.gguf
curl -X POST http://localhost:8080/layout/detect -d '{"image": "page.png"}'
```

```python
from crispembed import CrispLayout
layout = CrispLayout("layout-heron-q8_0.gguf")
regions = layout.detect("document.png")
```

## Parity

- Encoder: all stages cos=1.0 vs HF reference (with exact input)
- Detection score: 0.934 (HF reference: 0.955)
- 14 parity bugs found and fixed via systematic layer-by-layer diff

## License

Apache-2.0 (same as upstream docling-layout-heron).