--- license: cc-by-nc-sa-4.0 pipeline_tag: feature-extraction tags: - bioacoustics - audio-classification - audio - speech - automatic-speech-recognition library_name: transformers datasets: - agkphysics/AudioSet - Fhrozen/FSD50k - Loie/VGGSound language: - en --- # BioME: A Resource-Efficient Bioacoustic Foundational Model > BioME (**Bio**acoustic **M**odulation-aware **E**ncoder) is a resource-efficient audio encoder designed for bioacoustic applications. BioME is trained via layer-to-layer distillation from a high-capacity teacher model (BEATs), enabling strong representational transfer while significantly reducing the parameter count. To further improve ecological generalization, the model is pretrained on multi-domain data spanning speech, environmental sounds, and animal vocalizations. A key contribution is the integration of modulation-aware acoustic features via FiLM conditioning, injecting a DSP-inspired inductive bias that enhances feature disentanglement in low-capacity regimes. You can read the full preprint [**here**](https://arxiv.org/abs/2602.09970) --- ## Checkpoints | Model | Parameters | Dim | Layer | Checkpoint | | ---------- | ---------- | ---- | ----- | ------------------------------------------------- | | BioME Edge | 6M | 192 | 12 | [link](https://huggingface.co/Hguimaraes/biome_edge_bio) | | Biome Small | 26M | 384 | 12 | [link](https://huggingface.co/Hguimaraes/biome_small_bio) | | Biome Base | 76M | 768 | 12 | [link](https://huggingface.co/Hguimaraes/biome_base_bio) | --- ## 🚀 How To Use **Installation** ``` pip install -U transformers ``` **Load Model and Extract Features** ```python import torch import torchaudio from transformers import AutoModel # Load pre-trained model model = AutoModel.from_pretrained("Hguimaraes/biome_edge_bio", trust_remote_code=True).cuda().eval() # Load audio and resample to 16kHz wav, sr = torchaudio.load_audio("path/to/audio") # (batch_size, wav_len) wav = torchaudio.functional.resample( wav, sr, 16000, lowpass_filter_width=64, rolloff=0.9475937167399596, resampling_method="sinc_interp_kaiser", beta=14.769656459379492, ) # Extract features with torch.no_grad(): output = model(wav) # output["last_hidden_states"]: final output (batch_size, seq_len, encoder_dim) # output["hidden_states"]: list of 12 elements with (batch_size, seq_len, encoder_dim) tensors (features for each layer) ``` For more details on the model architecture, please check the file [modeling_biome.py](https://huggingface.co/Hguimaraes/biome_edge_bio/blob/main/modeling_biome.py) --- ## 📖 Citation ```bibtex @article{guimaraes2026biome, title={BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications}, author={Guimar{\~a}es, Heitor R and Tiwari, Abhishek and Abdollahi, Mahsa and Avila, Anderson R and Falk, Tiago H}, journal={arXiv preprint arXiv:2602.09970}, year={2026} } ``` --- ## Acknowledgement Much of our code base (and even this README.md!) is based on the following repositories: - [USAD](https://huggingface.co/MIT-SLS/USAD-Base) - [BEATs](https://github.com/microsoft/unilm/tree/master/beats) - [Llama 3](https://github.com/meta-llama/llama3) - [OpenBEATs](https://shikhar-s.github.io/OpenBEATs/) Thank you so much to the authors!