# LAION-CLAP — AEmotionStudio mirror Mirror of [LAION-CLAP](https://github.com/LAION-AI/CLAP) audio-text joint-embedding model weights, used by: - Tessera's Find-Similar grain overlay (corpus map → click → top-K) - The standalone **CLAP** panel: Text Search · Similar Clips · Auto-tag Upstream: https://huggingface.co/lukewys/laion_clap License: CC0-1.0. ## Format We ship `.safetensors` only (no pickle, no PyTorch 2.6+ `weights_only=True` gotchas, ~3× smaller than the upstream `.pt` because training metadata is dropped). Each file contains the bare audio-encoder + text-encoder `state_dict`. Use `safetensors.torch.load_file(path)` and `module.model.load_state_dict(sd, strict=False)` — the legacy `load_ckpt(ckpt=...)` API still works against the upstream `.pt` files but not against these. ## Files - `630k-audioset-best.safetensors` (variant `general`, `amodel=HTSAT-tiny`) — non-fusion HTSAT-tiny checkpoint trained on 630k clips + AudioSet (best validation); `amodel='HTSAT-tiny'` in `laion_clap.CLAP_Module(...)`. - `music_audioset_epoch_15_esc_90.14.safetensors` (variant `music`, `amodel=HTSAT-base`) — music-specialized LAION-CLAP fine-tune; 90.14% on ESC-50; better on music corpora at the cost of marginal regression on speech/SFX. `amodel='HTSAT-base'` (NOT tiny — the music variant trains a bigger backbone). ## Loading ```python import laion_clap from safetensors.torch import load_file m = laion_clap.CLAP_Module(enable_fusion=False, amodel='HTSAT-tiny') sd = load_file('630k-audioset-best.safetensors') m.model.load_state_dict(sd, strict=False) emb = m.get_audio_embedding_from_data(audio_array_list) ```