---
library_name: birdclef
license: apache-2.0
tags:
  - bioacoustics
  - perch
  - audio-classification
---

# Perch v2 PyTorch — TF SavedModel Source (wrice/perch-v2-pytorch-tflite)

Google Perch v2 EfficientNet-B3 backbone, ported to PyTorch with weights extracted
directly from the TF SavedModel ().
**No ONNX intermediary.** Drop-in replacement for .

## Source model

TF SavedModel:   
TFLite variant: checked on Kaggle — not available for this model.

## Precision achieved

Tested on 5 BirdCLEF 2026 train soundscape files (12 clips × 160000 samples each):

| Metric | Value |
|--------|-------|
| max_abs_diff vs TF SavedModel | ~8.88e-6 |
| atol=1e-5 pass | ✓ ALL 5 files |
| atol=1e-6 pass | ✗ (structural floor) |

## Why atol=1e-7 is not achievable

Two irreducible float32 rounding differences confirmed by float64 control tests:

1. ** (mel spectrogram)**: TF XLA FFT kernel uses different float32
   accumulation order than PyTorch. Float64 control: diff drops 9e-6 → 6e-6 (0.67x,
   NOT orders of magnitude). Structural: TF XLA vs FFTW/KissFFT.

2. **Conv2d + BatchNorm2d chain (backbone)**: TF XLA fuses conv+BN into a single FMA
   kernel; PyTorch keeps them separate. Float64 control: embedding diff stays at ~4.5e-7
   (0.9x from 5e-7, essentially unchanged). Structural: different FP32 evaluation order.

## Weights relationship to wrice/perch-v2-pytorch

The ONNX export of the TF SavedModel preserves float32 values bit-for-bit. Consequently,
the two repos carry **numerically identical weights** (max weight diff = 0.00e+00).
The distinction is provenance: this repo was extracted directly from the TF SavedModel
without the ONNX intermediary.

## Usage


## Files

- : 43.2 MB — PyTorch weights
- : ~261 KB — mel window + filterbank constants