lukasz-staniszewski's picture
Refresh card: AudioLDMCAASteeringController quickstart
106fb11 verified
---
library_name: audio-interv
tags:
- activation-steering
- audio
- audioldm
- audioldm2
- caa
- diffusion
- interpretability
- music
- steering
- vocal-gender
---
# CAA — `vocal_gender` (AudioLDM2)
Steering vectors for the **vocal_gender** concept on AudioLDM2, computed via contrastive activation addition (CAA).
## Paper
TADA! Tuning Audio Diffusion Models through Activation Steering — [https://huggingface.co/papers/2602.11910](https://huggingface.co/papers/2602.11910)
## Quickstart
```python
from src.steering import SteerableAudioLDMModel, AudioLDMCAASteeringController
model = SteerableAudioLDMModel(device="cuda")
ctrl = AudioLDMCAASteeringController.from_pretrained("lukasz-staniszewski/audioldm2-caa-vocal-gender", alpha=1.0)
with model.steer(ctrl):
out = model.generate(
prompt="instrumental music",
num_inference_steps=30, audio_length_in_s=10.0,
guidance_scale=3.5, seed=0,
)
```
## Generation config
```json
{
"method": "standard_caa_audioldm",
"model": "cvssp/audioldm2-large",
"concept": "vocal_gender",
"num_inference_steps": 100,
"audio_length_in_s": 10.0,
"guidance_scale": 4.5,
"seed": 10,
"device": "cuda",
"dtype": "float16",
"save_all_cfg_passes": true,
"layers_preset": "all",
"layers_to_steer": [
".unet.down_blocks.1.attentions.1.transformer_blocks.0.attn2",
".unet.down_blocks.1.attentions.1.transformer_blocks.1.attn2",
".unet.down_blocks.1.attentions.2.transformer_blocks.0.attn2",
".unet.down_blocks.1.attentions.2.transformer_blocks.1.attn2",
".unet.down_blocks.1.attentions.5.transformer_blocks.0.attn2",
".unet.down_blocks.1.attentions.5.transformer_blocks.1.attn2",
".unet.down_blocks.1.attentions.6.transformer_blocks.0.attn2",
".unet.down_blocks.1.attentions.6.transformer_blocks.1.attn2",
".unet.down_blocks.2.attentions.1.transformer_blocks.0.attn2",
".unet.down_blocks.2.attentions.1.transformer_blocks.1.attn2",
".unet.down_blocks.2.attentions.2.transformer_blocks.0.attn2",
".unet.down_blocks.2.attentions.2.transformer_blocks.1.attn2",
".unet.down_blocks.2.attentions.5.transformer_blocks.0.attn2",
".unet.down_blocks.2.attentions.5.transformer_blocks.1.attn2",
".unet.down_blocks.2.attentions.6.transformer_blocks.0.attn2",
".unet.down_blocks.2.attentions.6.transformer_blocks.1.attn2",
".unet.down_blocks.3.attentions.1.transformer_blocks.0.attn2",
".unet.down_blocks.3.attentions.1.transformer_blocks.1.attn2",
".unet.down_blocks.3.attentions.2.transformer_blocks.0.attn2",
".unet.down_blocks.3.attentions.2.transformer_blocks.1.attn2",
".unet.down_blocks.3.attentions.5.transformer_blocks.0.attn2",
".unet.down_blocks.3.attentions.5.transformer_blocks.1.attn2",
".unet.down_blocks.3.attentions.6.transformer_blocks.0.attn2",
".unet.down_blocks.3.attentions.6.transformer_blocks.1.attn2",
".unet.mid_block.attentions.1.transformer_blocks.0.attn2",
".unet.mid_block.attentions.1.transformer_blocks.1.attn2",
".unet.mid_block.attentions.2.transformer_blocks.0.attn2",
".unet.mid_block.attentions.2.transformer_blocks.1.attn2",
".unet.up_blocks.0.attentions.1.transformer_blocks.0.attn2",
".unet.up_blocks.0.attentions.1.transformer_blocks.1.attn2",
".unet.up_blocks.0.attentions.2.transformer_blocks.0.attn2",
".unet.up_blocks.0.attentions.2.transformer_blocks.1.attn2",
".unet.up_blocks.0.attentions.5.transformer_blocks.0.attn2",
".unet.up_blocks.0.attentions.5.transformer_blocks.1.attn2",
".unet.up_blocks.0.attentions.6.transformer_blocks.0.attn2",
".unet.up_blocks.0.attentions.6.transformer_blocks.1.attn2",
".unet.up_blocks.0.attentions.9.transformer_blocks.0.attn2",
".unet.up_blocks.0.attentions.9.transformer_blocks.1.attn2",
".unet.up_blocks.0.attentions.10.transformer_blocks.0.attn2",
".unet.up_blocks.0.attentions.10.transformer_blocks.1.attn2",
".unet.up_blocks.1.attentions.1.transformer_blocks.0.attn2",
".unet.up_blocks.1.attentions.1.transformer_blocks.1.attn2",
".unet.up_blocks.1.attentions.2.transformer_blocks.0.attn2",
".unet.up_blocks.1.attentions.2.transformer_blocks.1.attn2",
".unet.up_blocks.1.attentions.5.transformer_blocks.0.attn2",
".unet.up_blocks.1.attentions.5.transformer_blocks.1.attn2",
".unet.up_blocks.1.attentions.6.transformer_blocks.0.attn2",
".unet.up_blocks.1.attentions.6.transformer_blocks.1.attn2",
".unet.up_blocks.1.attentions.9.transformer_blocks.0.attn2",
".unet.up_blocks.1.attentions.9.transformer_blocks.1.attn2",
".unet.up_blocks.1.attentions.10.transformer_blocks.0.attn2",
".unet.up_blocks.1.attentions.10.transformer_blocks.1.attn2",
".unet.up_blocks.2.attentions.1.transformer_blocks.0.attn2",
".unet.up_blocks.2.attentions.1.transformer_blocks.1.attn2",
".unet.up_blocks.2.attentions.2.transformer_blocks.0.attn2",
".unet.up_blocks.2.attentions.2.transformer_blocks.1.attn2",
".unet.up_blocks.2.attentions.5.transformer_blocks.0.attn2",
".unet.up_blocks.2.attentions.5.transformer_blocks.1.attn2",
".unet.up_blocks.2.attentions.6.transformer_blocks.0.attn2",
".unet.up_blocks.2.attentions.6.transformer_blocks.1.attn2",
".unet.up_blocks.2.attentions.9.transformer_blocks.0.attn2",
".unet.up_blocks.2.attentions.9.transformer_blocks.1.attn2",
".unet.up_blocks.2.attentions.10.transformer_blocks.0.attn2",
".unet.up_blocks.2.attentions.10.transformer_blocks.1.attn2"
],
"normalize_sv": true
}
```