--- library_name: birdclef license: apache-2.0 tags: - bioacoustics - perch - audio-classification --- # Perch v2 PyTorch — TF SavedModel Source (wrice/perch-v2-pytorch-tflite) Google Perch v2 EfficientNet-B3 backbone, ported to PyTorch with weights extracted directly from the TF SavedModel (). **No ONNX intermediary.** Drop-in replacement for . ## Source model TF SavedModel: TFLite variant: checked on Kaggle — not available for this model. ## Precision achieved Tested on 5 BirdCLEF 2026 train soundscape files (12 clips × 160000 samples each): | Metric | Value | |--------|-------| | max_abs_diff vs TF SavedModel | ~8.88e-6 | | atol=1e-5 pass | ✓ ALL 5 files | | atol=1e-6 pass | ✗ (structural floor) | ## Why atol=1e-7 is not achievable Two irreducible float32 rounding differences confirmed by float64 control tests: 1. ** (mel spectrogram)**: TF XLA FFT kernel uses different float32 accumulation order than PyTorch. Float64 control: diff drops 9e-6 → 6e-6 (0.67x, NOT orders of magnitude). Structural: TF XLA vs FFTW/KissFFT. 2. **Conv2d + BatchNorm2d chain (backbone)**: TF XLA fuses conv+BN into a single FMA kernel; PyTorch keeps them separate. Float64 control: embedding diff stays at ~4.5e-7 (0.9x from 5e-7, essentially unchanged). Structural: different FP32 evaluation order. ## Weights relationship to wrice/perch-v2-pytorch The ONNX export of the TF SavedModel preserves float32 values bit-for-bit. Consequently, the two repos carry **numerically identical weights** (max weight diff = 0.00e+00). The distinction is provenance: this repo was extracted directly from the TF SavedModel without the ONNX intermediary. ## Usage ## Files - : 43.2 MB — PyTorch weights - : ~261 KB — mel window + filterbank constants