PP-FormulaNet-L GGUF β€” Printed Math OCR

GGUF-quantized versions of PaddlePaddle's PP-FormulaNet-L for on-device printed formula recognition (image to LaTeX).

Model Description

Property Value
Architecture SAM-ViT encoder + MBart Transformer decoder
Parameters 181M
Input 768x768 RGB image
Output LaTeX token sequence
Vocab 50,000 tokens (NougatTokenizer / BPE)
License Apache-2.0

Encoder: SAM-style Vision Transformer

Property Value
Type SAM ViT-B (from PaddleOCR Vary_VIT_B_Formula)
Layers 12
Hidden dim 768
Heads 12
MLP dim 3072
Patch size 16x16 (48x48 patches)
Attention Windowed (ws=14) on layers 0,1,3,4,6,7,9,10; Global on layers 2,5,8,11
Position bias Decomposed relative position (per-axis, interpolated)
Neck Conv1x1 + LayerNorm2d + Conv3x3 + LayerNorm2d (768 -> 256)
Projector 2x Conv3x3(stride=2) + 2x Linear (256 -> 512). Output: 144 tokens x 512d

Decoder: MBart (Pre-LayerNorm)

Property Value
Layers 8
Heads 16
d_model 512
FFN dim 2048
Activation GELU
Embedding scale_embedding = sqrt(512)
Max length 1024 tokens

Available Variants

File Quant Size Encoder cos vs F32 Notes
ppformulanet-l-f16.gguf FP16 347 MB baseline Full precision
ppformulanet-l-q8_0.gguf Q8_0 241 MB 0.999940 Critical tensors in F16
ppformulanet-l-q4_k.gguf Q4_K 122 MB 0.997595 Desktop/mobile target

All three produce identical decoded LaTeX on test formulas.

Recommended: ppformulanet-l-q8_0.gguf (241 MB) β€” near-lossless quality at 1.4x compression vs F16.

For mobile/desktop: ppformulanet-l-q4_k.gguf (122 MB) β€” good quality at 2.8x compression.

Quantization Strategy

Q8_0 mode keeps critical tensors in FP16 for quality:

  • Embeddings (token, position, patch)
  • LayerNorm weights/biases
  • Relative position bias tables (tiny, critical for attention geometry)
  • LM head (determines output tokens)
  • Neck and projector weights (encoder-decoder bottleneck)

Large attention/MLP weight matrices are quantized to Q8_0.

Q4_K mode uses CrispEmbed's crispembed-quantize tool (K-quant with importance-weighted groups). LayerNorm and biases stay in F16; large matrices go to Q4_K.

Usage with CrispEmbed

# CLI β€” auto-detected from GGUF metadata
crispembed -m ppformulanet-l-q8_0.gguf --ocr formula.png

# Output: LaTeX string
# \zeta_{0}(\nu) = -\frac{\nu\varrho^{-2\nu}}{\pi} ...

C API

#include "crispembed.h"

void *ctx = crispembed_math_ocr_init("ppformulanet-l-q8_0.gguf", 4);
int len;
const char *latex = crispembed_math_ocr_recognize(ctx, pixels, w, h, channels, &len);
printf("%s\n", latex);
crispembed_math_ocr_free(ctx);

Architecture auto-detection reads general.architecture = "ppformulanet_l" from GGUF metadata.

Image Preprocessing

The model expects UniMERNet-style preprocessing:

  1. Convert to grayscale, replicate to 3 channels
  2. Resize maintaining aspect ratio to fit 768x768
  3. Center-pad with black (0) to fill 768x768
  4. Normalize: mean=0.7931, std=0.1738

CrispEmbed handles this automatically when you pass raw image bytes.

Parity Verification

Tested against HuggingFace PPFormulaNetForConditionalGeneration reference:

Metric F32 Q8_0 Q4_K
Encoder cosine similarity 0.999962 0.999940 0.997595
Top-1 token match Yes Yes Yes
Full decode match Yes Yes Yes

Credits

Conversion

# F16
python models/convert-ppformulanet-l-to-gguf.py \
    --model-dir PP-FormulaNet-L --output ppformulanet-l-f16.gguf --fp16

# Q8_0 (critical tensors in F16)
python models/convert-ppformulanet-l-to-gguf.py \
    --model-dir PP-FormulaNet-L --output ppformulanet-l-q8_0.gguf --q8_0

# Q4_K (from F16 via C quantizer)
crispembed-quantize ppformulanet-l-f16.gguf ppformulanet-l-q4_k.gguf q4_k
Downloads last month
23
GGUF
Model size
0.2B params
Architecture
ppformulanet_l
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/ppformulanet-l-gguf

Quantized
(1)
this model