PP-FormulaNet-L GGUF β Printed Math OCR
GGUF-quantized versions of PaddlePaddle's PP-FormulaNet-L for on-device printed formula recognition (image to LaTeX).
Model Description
| Property | Value |
|---|---|
| Architecture | SAM-ViT encoder + MBart Transformer decoder |
| Parameters | 181M |
| Input | 768x768 RGB image |
| Output | LaTeX token sequence |
| Vocab | 50,000 tokens (NougatTokenizer / BPE) |
| License | Apache-2.0 |
Encoder: SAM-style Vision Transformer
| Property | Value |
|---|---|
| Type | SAM ViT-B (from PaddleOCR Vary_VIT_B_Formula) |
| Layers | 12 |
| Hidden dim | 768 |
| Heads | 12 |
| MLP dim | 3072 |
| Patch size | 16x16 (48x48 patches) |
| Attention | Windowed (ws=14) on layers 0,1,3,4,6,7,9,10; Global on layers 2,5,8,11 |
| Position bias | Decomposed relative position (per-axis, interpolated) |
| Neck | Conv1x1 + LayerNorm2d + Conv3x3 + LayerNorm2d (768 -> 256) |
| Projector | 2x Conv3x3(stride=2) + 2x Linear (256 -> 512). Output: 144 tokens x 512d |
Decoder: MBart (Pre-LayerNorm)
| Property | Value |
|---|---|
| Layers | 8 |
| Heads | 16 |
| d_model | 512 |
| FFN dim | 2048 |
| Activation | GELU |
| Embedding | scale_embedding = sqrt(512) |
| Max length | 1024 tokens |
Available Variants
| File | Quant | Size | Encoder cos vs F32 | Notes |
|---|---|---|---|---|
ppformulanet-l-f16.gguf |
FP16 | 347 MB | baseline | Full precision |
ppformulanet-l-q8_0.gguf |
Q8_0 | 241 MB | 0.999940 | Critical tensors in F16 |
ppformulanet-l-q4_k.gguf |
Q4_K | 122 MB | 0.997595 | Desktop/mobile target |
All three produce identical decoded LaTeX on test formulas.
Recommended: ppformulanet-l-q8_0.gguf (241 MB) β near-lossless quality at 1.4x compression vs F16.
For mobile/desktop: ppformulanet-l-q4_k.gguf (122 MB) β good quality at 2.8x compression.
Quantization Strategy
Q8_0 mode keeps critical tensors in FP16 for quality:
- Embeddings (token, position, patch)
- LayerNorm weights/biases
- Relative position bias tables (tiny, critical for attention geometry)
- LM head (determines output tokens)
- Neck and projector weights (encoder-decoder bottleneck)
Large attention/MLP weight matrices are quantized to Q8_0.
Q4_K mode uses CrispEmbed's crispembed-quantize tool (K-quant with importance-weighted groups). LayerNorm and biases stay in F16; large matrices go to Q4_K.
Usage with CrispEmbed
# CLI β auto-detected from GGUF metadata
crispembed -m ppformulanet-l-q8_0.gguf --ocr formula.png
# Output: LaTeX string
# \zeta_{0}(\nu) = -\frac{\nu\varrho^{-2\nu}}{\pi} ...
C API
#include "crispembed.h"
void *ctx = crispembed_math_ocr_init("ppformulanet-l-q8_0.gguf", 4);
int len;
const char *latex = crispembed_math_ocr_recognize(ctx, pixels, w, h, channels, &len);
printf("%s\n", latex);
crispembed_math_ocr_free(ctx);
Architecture auto-detection reads general.architecture = "ppformulanet_l" from GGUF metadata.
Image Preprocessing
The model expects UniMERNet-style preprocessing:
- Convert to grayscale, replicate to 3 channels
- Resize maintaining aspect ratio to fit 768x768
- Center-pad with black (0) to fill 768x768
- Normalize: mean=0.7931, std=0.1738
CrispEmbed handles this automatically when you pass raw image bytes.
Parity Verification
Tested against HuggingFace PPFormulaNetForConditionalGeneration reference:
| Metric | F32 | Q8_0 | Q4_K |
|---|---|---|---|
| Encoder cosine similarity | 0.999962 | 0.999940 | 0.997595 |
| Top-1 token match | Yes | Yes | Yes |
| Full decode match | Yes | Yes | Yes |
Credits
- PaddlePaddle/PaddleOCR β PP-FormulaNet-L architecture and weights (Apache-2.0)
- HuggingFace Transformers β safetensors conversion and reference implementation
- CrispEmbed β GGUF conversion, C++ inference engine, quantization
Conversion
# F16
python models/convert-ppformulanet-l-to-gguf.py \
--model-dir PP-FormulaNet-L --output ppformulanet-l-f16.gguf --fp16
# Q8_0 (critical tensors in F16)
python models/convert-ppformulanet-l-to-gguf.py \
--model-dir PP-FormulaNet-L --output ppformulanet-l-q8_0.gguf --q8_0
# Q4_K (from F16 via C quantizer)
crispembed-quantize ppformulanet-l-f16.gguf ppformulanet-l-q4_k.gguf q4_k
- Downloads last month
- 23
8-bit
16-bit
Model tree for cstr/ppformulanet-l-gguf
Base model
PaddlePaddle/PP-FormulaNet-L_safetensors