---
license: apache-2.0
library_name: gguf
pipeline_tag: image-to-text
tags:
  - latex-ocr
  - math
  - formula-recognition
  - gguf
  - crispembed
  - ppformulanet
  - sam-vit
  - mbart
base_model: PaddlePaddle/PP-FormulaNet-L_safetensors
---

# PP-FormulaNet-L GGUF — Printed Math OCR

GGUF-quantized versions of [PaddlePaddle's PP-FormulaNet-L](https://huggingface.co/PaddlePaddle/PP-FormulaNet-L_safetensors) for on-device printed formula recognition (image to LaTeX).

## Model Description

| Property | Value |
|---|---|
| Architecture | SAM-ViT encoder + MBart Transformer decoder |
| Parameters | 181M |
| Input | 768x768 RGB image |
| Output | LaTeX token sequence |
| Vocab | 50,000 tokens (NougatTokenizer / BPE) |
| License | **Apache-2.0** |

### Encoder: SAM-style Vision Transformer

| Property | Value |
|---|---|
| Type | SAM ViT-B (from PaddleOCR Vary_VIT_B_Formula) |
| Layers | 12 |
| Hidden dim | 768 |
| Heads | 12 |
| MLP dim | 3072 |
| Patch size | 16x16 (48x48 patches) |
| Attention | Windowed (ws=14) on layers 0,1,3,4,6,7,9,10; Global on layers 2,5,8,11 |
| Position bias | Decomposed relative position (per-axis, interpolated) |
| Neck | Conv1x1 + LayerNorm2d + Conv3x3 + LayerNorm2d (768 -> 256) |
| Projector | 2x Conv3x3(stride=2) + 2x Linear (256 -> 512). Output: 144 tokens x 512d |

### Decoder: MBart (Pre-LayerNorm)

| Property | Value |
|---|---|
| Layers | 8 |
| Heads | 16 |
| d_model | 512 |
| FFN dim | 2048 |
| Activation | GELU |
| Embedding | scale_embedding = sqrt(512) |
| Max length | 1024 tokens |

## Available Variants

| File | Quant | Size | Encoder cos vs F32 | Notes |
|---|---|---|---|---|
| `ppformulanet-l-f16.gguf` | FP16 | 347 MB | baseline | Full precision |
| `ppformulanet-l-q8_0.gguf` | Q8_0 | 241 MB | 0.999940 | Critical tensors in F16 |
| `ppformulanet-l-q4_k.gguf` | Q4_K | 122 MB | 0.997595 | Desktop/mobile target |

All three produce **identical decoded LaTeX** on test formulas.

**Recommended: `ppformulanet-l-q8_0.gguf`** (241 MB) — near-lossless quality at 1.4x compression vs F16.

**For mobile/desktop: `ppformulanet-l-q4_k.gguf`** (122 MB) — good quality at 2.8x compression.

### Quantization Strategy

**Q8_0 mode** keeps critical tensors in FP16 for quality:
- Embeddings (token, position, patch)
- LayerNorm weights/biases
- Relative position bias tables (tiny, critical for attention geometry)
- LM head (determines output tokens)
- Neck and projector weights (encoder-decoder bottleneck)

Large attention/MLP weight matrices are quantized to Q8_0.

**Q4_K mode** uses CrispEmbed's `crispembed-quantize` tool (K-quant with importance-weighted groups). LayerNorm and biases stay in F16; large matrices go to Q4_K.

## Usage with CrispEmbed

```bash
# CLI — auto-detected from GGUF metadata
crispembed -m ppformulanet-l-q8_0.gguf --ocr formula.png

# Output: LaTeX string
# \zeta_{0}(\nu) = -\frac{\nu\varrho^{-2\nu}}{\pi} ...
```

### C API

```c
#include "crispembed.h"

void *ctx = crispembed_math_ocr_init("ppformulanet-l-q8_0.gguf", 4);
int len;
const char *latex = crispembed_math_ocr_recognize(ctx, pixels, w, h, channels, &len);
printf("%s\n", latex);
crispembed_math_ocr_free(ctx);
```

Architecture auto-detection reads `general.architecture = "ppformulanet_l"` from GGUF metadata.

## Image Preprocessing

The model expects UniMERNet-style preprocessing:
1. Convert to grayscale, replicate to 3 channels
2. Resize maintaining aspect ratio to fit 768x768
3. Center-pad with black (0) to fill 768x768
4. Normalize: mean=0.7931, std=0.1738

CrispEmbed handles this automatically when you pass raw image bytes.

## Parity Verification

Tested against HuggingFace `PPFormulaNetForConditionalGeneration` reference:

| Metric | F32 | Q8_0 | Q4_K |
|---|---|---|---|
| Encoder cosine similarity | 0.999962 | 0.999940 | 0.997595 |
| Top-1 token match | Yes | Yes | Yes |
| Full decode match | Yes | Yes | Yes |

## Credits

- [PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) — PP-FormulaNet-L architecture and weights (Apache-2.0)
- [HuggingFace Transformers](https://huggingface.co/PaddlePaddle/PP-FormulaNet-L_safetensors) — safetensors conversion and reference implementation
- [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed) — GGUF conversion, C++ inference engine, quantization

## Conversion

```bash
# F16
python models/convert-ppformulanet-l-to-gguf.py \
    --model-dir PP-FormulaNet-L --output ppformulanet-l-f16.gguf --fp16

# Q8_0 (critical tensors in F16)
python models/convert-ppformulanet-l-to-gguf.py \
    --model-dir PP-FormulaNet-L --output ppformulanet-l-q8_0.gguf --q8_0

# Q4_K (from F16 via C quantizer)
crispembed-quantize ppformulanet-l-f16.gguf ppformulanet-l-q4_k.gguf q4_k
```