---
license: cc-by-nc-sa-4.0
pipeline_tag: feature-extraction
tags:
- bioacoustics
- audio-classification
- audio
- speech
- automatic-speech-recognition
library_name: transformers
datasets:
- agkphysics/AudioSet
- Fhrozen/FSD50k
- Loie/VGGSound
language:
- en
---

# BioME: A Resource-Efficient Bioacoustic Foundational Model

> BioME (**Bio**acoustic **M**odulation-aware **E**ncoder) is a resource-efficient audio encoder designed for bioacoustic applications. BioME is trained via layer-to-layer distillation from a high-capacity teacher model (BEATs), enabling strong representational transfer while significantly reducing the parameter count. To further improve ecological generalization, the model is pretrained on multi-domain data spanning speech, environmental sounds, and animal vocalizations. A key contribution is the integration of modulation-aware acoustic features via FiLM conditioning, injecting a DSP-inspired inductive bias that enhances feature disentanglement in low-capacity regimes.

You can read the full preprint [**here**](https://arxiv.org/abs/2602.09970)

---

## Checkpoints


| Model      | Parameters | Dim  | Layer | Checkpoint                                        |
| ---------- | ---------- | ---- | ----- | ------------------------------------------------- |
| BioME Edge | 6M        | 192  | 12    | [link](https://huggingface.co/Hguimaraes/biome_edge_bio) |
| Biome Small  | 26M        | 384  | 12    | [link](https://huggingface.co/Hguimaraes/biome_small_bio)  |
| Biome Base | 76M       | 768 | 12    | [link](https://huggingface.co/Hguimaraes/biome_base_bio) |

---


## 🚀 How To Use

**Installation**
```
pip install -U transformers
```

**Load Model and Extract Features**
```python
import torch
import torchaudio
from transformers import AutoModel

# Load pre-trained model
model = AutoModel.from_pretrained("Hguimaraes/biome_edge_bio", trust_remote_code=True).cuda().eval()

# Load audio and resample to 16kHz
wav, sr = torchaudio.load_audio("path/to/audio")  # (batch_size, wav_len)
wav = torchaudio.functional.resample(
    wav,
    sr,
    16000,
    lowpass_filter_width=64,
    rolloff=0.9475937167399596,
    resampling_method="sinc_interp_kaiser",
    beta=14.769656459379492,
)

# Extract features
with torch.no_grad():
    output = model(wav)

# output["last_hidden_states"]: final output (batch_size, seq_len, encoder_dim)
# output["hidden_states"]:      list of 12 elements with (batch_size, seq_len, encoder_dim) tensors (features for each layer)
```

For more details on the model architecture, please check the file [modeling_biome.py](https://huggingface.co/Hguimaraes/biome_edge_bio/blob/main/modeling_biome.py) 

---

## 📖 Citation

```bibtex
@article{guimaraes2026biome,
  title={BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications},
  author={Guimar{\~a}es, Heitor R and Tiwari, Abhishek and Abdollahi, Mahsa and Avila, Anderson R and Falk, Tiago H},
  journal={arXiv preprint arXiv:2602.09970},
  year={2026}
}
```

---

## Acknowledgement

Much of our code base (and even this README.md!) is based on the following repositories:

- [USAD](https://huggingface.co/MIT-SLS/USAD-Base)
- [BEATs](https://github.com/microsoft/unilm/tree/master/beats)
- [Llama 3](https://github.com/meta-llama/llama3)
- [OpenBEATs](https://shikhar-s.github.io/OpenBEATs/)

Thank you so much to the authors!