PP-FormulaNet-L GGUF — Printed Math OCR

GGUF-quantized versions of PaddlePaddle's PP-FormulaNet-L for on-device printed formula recognition (image to LaTeX).

Model Description

Property	Value
Architecture	SAM-ViT encoder + MBart Transformer decoder
Parameters	181M
Input	768x768 RGB image
Output	LaTeX token sequence
Vocab	50,000 tokens (NougatTokenizer / BPE)
License	Apache-2.0

Encoder: SAM-style Vision Transformer

Property	Value
Type	SAM ViT-B (from PaddleOCR Vary_VIT_B_Formula)
Layers	12
Hidden dim	768
Heads	12
MLP dim	3072
Patch size	16x16 (48x48 patches)
Attention	Windowed (ws=14) on layers 0,1,3,4,6,7,9,10; Global on layers 2,5,8,11
Position bias	Decomposed relative position (per-axis, interpolated)
Neck	Conv1x1 + LayerNorm2d + Conv3x3 + LayerNorm2d (768 -> 256)
Projector	2x Conv3x3(stride=2) + 2x Linear (256 -> 512). Output: 144 tokens x 512d

Decoder: MBart (Pre-LayerNorm)

Property	Value
Layers	8
Heads	16
d_model	512
FFN dim	2048
Activation	GELU
Embedding	scale_embedding = sqrt(512)
Max length	1024 tokens

Available Variants

File	Quant	Size	Encoder cos vs F32	Notes
`ppformulanet-l-f16.gguf`	FP16	347 MB	baseline	Full precision
`ppformulanet-l-q8_0.gguf`	Q8_0	241 MB	0.999940	Critical tensors in F16
`ppformulanet-l-q4_k.gguf`	Q4_K	122 MB	0.997595	Desktop/mobile target

All three produce identical decoded LaTeX on test formulas.

Recommended: ppformulanet-l-q8_0.gguf (241 MB) — near-lossless quality at 1.4x compression vs F16.

For mobile/desktop: ppformulanet-l-q4_k.gguf (122 MB) — good quality at 2.8x compression.

Quantization Strategy

Q8_0 mode keeps critical tensors in FP16 for quality:

Embeddings (token, position, patch)
LayerNorm weights/biases
Relative position bias tables (tiny, critical for attention geometry)
LM head (determines output tokens)
Neck and projector weights (encoder-decoder bottleneck)

Large attention/MLP weight matrices are quantized to Q8_0.

Q4_K mode uses CrispEmbed's crispembed-quantize tool (K-quant with importance-weighted groups). LayerNorm and biases stay in F16; large matrices go to Q4_K.

Usage with CrispEmbed

# CLI — auto-detected from GGUF metadata
crispembed -m ppformulanet-l-q8_0.gguf --ocr formula.png

# Output: LaTeX string
# \zeta_{0}(\nu) = -\frac{\nu\varrho^{-2\nu}}{\pi} ...

C API

#include "crispembed.h"

void *ctx = crispembed_math_ocr_init("ppformulanet-l-q8_0.gguf", 4);
int len;
const char *latex = crispembed_math_ocr_recognize(ctx, pixels, w, h, channels, &len);
printf("%s\n", latex);
crispembed_math_ocr_free(ctx);

Architecture auto-detection reads general.architecture = "ppformulanet_l" from GGUF metadata.

Image Preprocessing

The model expects UniMERNet-style preprocessing:

Convert to grayscale, replicate to 3 channels
Resize maintaining aspect ratio to fit 768x768
Center-pad with black (0) to fill 768x768
Normalize: mean=0.7931, std=0.1738

CrispEmbed handles this automatically when you pass raw image bytes.

Parity Verification

Tested against HuggingFace PPFormulaNetForConditionalGeneration reference:

Metric	F32	Q8_0	Q4_K
Encoder cosine similarity	0.999962	0.999940	0.997595
Top-1 token match	Yes	Yes	Yes
Full decode match	Yes	Yes	Yes

Credits

PaddlePaddle/PaddleOCR — PP-FormulaNet-L architecture and weights (Apache-2.0)
HuggingFace Transformers — safetensors conversion and reference implementation
CrispEmbed — GGUF conversion, C++ inference engine, quantization

Conversion

# F16
python models/convert-ppformulanet-l-to-gguf.py \
    --model-dir PP-FormulaNet-L --output ppformulanet-l-f16.gguf --fp16

# Q8_0 (critical tensors in F16)
python models/convert-ppformulanet-l-to-gguf.py \
    --model-dir PP-FormulaNet-L --output ppformulanet-l-q8_0.gguf --q8_0

# Q4_K (from F16 via C quantizer)
crispembed-quantize ppformulanet-l-f16.gguf ppformulanet-l-q4_k.gguf q4_k

Downloads last month: 23

GGUF

Model size

0.2B params

Architecture

ppformulanet_l

Hardware compatibility

8-bit

16-bit

Model tree for cstr/ppformulanet-l-gguf

Base model

PaddlePaddle/PP-FormulaNet-L_safetensors

Quantized

(1)

this model