| --- |
| library_name: audio-interv |
| tags: |
| - activation-steering |
| - audio |
| - audioldm |
| - audioldm2 |
| - caa |
| - diffusion |
| - interpretability |
| - music |
| - steering |
| - vocal-gender |
| --- |
| |
| # CAA — `vocal_gender` (AudioLDM2) |
| |
| Steering vectors for the **vocal_gender** concept on AudioLDM2, computed via contrastive activation addition (CAA). |
| |
| ## Paper |
| |
| TADA! Tuning Audio Diffusion Models through Activation Steering — [https://huggingface.co/papers/2602.11910](https://huggingface.co/papers/2602.11910) |
| |
| ## Quickstart |
| |
| ```python |
| from src.steering import SteerableAudioLDMModel, AudioLDMCAASteeringController |
| |
| model = SteerableAudioLDMModel(device="cuda") |
| ctrl = AudioLDMCAASteeringController.from_pretrained("lukasz-staniszewski/audioldm2-caa-vocal-gender", alpha=1.0) |
|
|
| with model.steer(ctrl): |
| out = model.generate( |
| prompt="instrumental music", |
| num_inference_steps=30, audio_length_in_s=10.0, |
| guidance_scale=3.5, seed=0, |
| ) |
| ``` |
| |
| ## Generation config |
|
|
| ```json |
| { |
| "method": "standard_caa_audioldm", |
| "model": "cvssp/audioldm2-large", |
| "concept": "vocal_gender", |
| "num_inference_steps": 100, |
| "audio_length_in_s": 10.0, |
| "guidance_scale": 4.5, |
| "seed": 10, |
| "device": "cuda", |
| "dtype": "float16", |
| "save_all_cfg_passes": true, |
| "layers_preset": "all", |
| "layers_to_steer": [ |
| ".unet.down_blocks.1.attentions.1.transformer_blocks.0.attn2", |
| ".unet.down_blocks.1.attentions.1.transformer_blocks.1.attn2", |
| ".unet.down_blocks.1.attentions.2.transformer_blocks.0.attn2", |
| ".unet.down_blocks.1.attentions.2.transformer_blocks.1.attn2", |
| ".unet.down_blocks.1.attentions.5.transformer_blocks.0.attn2", |
| ".unet.down_blocks.1.attentions.5.transformer_blocks.1.attn2", |
| ".unet.down_blocks.1.attentions.6.transformer_blocks.0.attn2", |
| ".unet.down_blocks.1.attentions.6.transformer_blocks.1.attn2", |
| ".unet.down_blocks.2.attentions.1.transformer_blocks.0.attn2", |
| ".unet.down_blocks.2.attentions.1.transformer_blocks.1.attn2", |
| ".unet.down_blocks.2.attentions.2.transformer_blocks.0.attn2", |
| ".unet.down_blocks.2.attentions.2.transformer_blocks.1.attn2", |
| ".unet.down_blocks.2.attentions.5.transformer_blocks.0.attn2", |
| ".unet.down_blocks.2.attentions.5.transformer_blocks.1.attn2", |
| ".unet.down_blocks.2.attentions.6.transformer_blocks.0.attn2", |
| ".unet.down_blocks.2.attentions.6.transformer_blocks.1.attn2", |
| ".unet.down_blocks.3.attentions.1.transformer_blocks.0.attn2", |
| ".unet.down_blocks.3.attentions.1.transformer_blocks.1.attn2", |
| ".unet.down_blocks.3.attentions.2.transformer_blocks.0.attn2", |
| ".unet.down_blocks.3.attentions.2.transformer_blocks.1.attn2", |
| ".unet.down_blocks.3.attentions.5.transformer_blocks.0.attn2", |
| ".unet.down_blocks.3.attentions.5.transformer_blocks.1.attn2", |
| ".unet.down_blocks.3.attentions.6.transformer_blocks.0.attn2", |
| ".unet.down_blocks.3.attentions.6.transformer_blocks.1.attn2", |
| ".unet.mid_block.attentions.1.transformer_blocks.0.attn2", |
| ".unet.mid_block.attentions.1.transformer_blocks.1.attn2", |
| ".unet.mid_block.attentions.2.transformer_blocks.0.attn2", |
| ".unet.mid_block.attentions.2.transformer_blocks.1.attn2", |
| ".unet.up_blocks.0.attentions.1.transformer_blocks.0.attn2", |
| ".unet.up_blocks.0.attentions.1.transformer_blocks.1.attn2", |
| ".unet.up_blocks.0.attentions.2.transformer_blocks.0.attn2", |
| ".unet.up_blocks.0.attentions.2.transformer_blocks.1.attn2", |
| ".unet.up_blocks.0.attentions.5.transformer_blocks.0.attn2", |
| ".unet.up_blocks.0.attentions.5.transformer_blocks.1.attn2", |
| ".unet.up_blocks.0.attentions.6.transformer_blocks.0.attn2", |
| ".unet.up_blocks.0.attentions.6.transformer_blocks.1.attn2", |
| ".unet.up_blocks.0.attentions.9.transformer_blocks.0.attn2", |
| ".unet.up_blocks.0.attentions.9.transformer_blocks.1.attn2", |
| ".unet.up_blocks.0.attentions.10.transformer_blocks.0.attn2", |
| ".unet.up_blocks.0.attentions.10.transformer_blocks.1.attn2", |
| ".unet.up_blocks.1.attentions.1.transformer_blocks.0.attn2", |
| ".unet.up_blocks.1.attentions.1.transformer_blocks.1.attn2", |
| ".unet.up_blocks.1.attentions.2.transformer_blocks.0.attn2", |
| ".unet.up_blocks.1.attentions.2.transformer_blocks.1.attn2", |
| ".unet.up_blocks.1.attentions.5.transformer_blocks.0.attn2", |
| ".unet.up_blocks.1.attentions.5.transformer_blocks.1.attn2", |
| ".unet.up_blocks.1.attentions.6.transformer_blocks.0.attn2", |
| ".unet.up_blocks.1.attentions.6.transformer_blocks.1.attn2", |
| ".unet.up_blocks.1.attentions.9.transformer_blocks.0.attn2", |
| ".unet.up_blocks.1.attentions.9.transformer_blocks.1.attn2", |
| ".unet.up_blocks.1.attentions.10.transformer_blocks.0.attn2", |
| ".unet.up_blocks.1.attentions.10.transformer_blocks.1.attn2", |
| ".unet.up_blocks.2.attentions.1.transformer_blocks.0.attn2", |
| ".unet.up_blocks.2.attentions.1.transformer_blocks.1.attn2", |
| ".unet.up_blocks.2.attentions.2.transformer_blocks.0.attn2", |
| ".unet.up_blocks.2.attentions.2.transformer_blocks.1.attn2", |
| ".unet.up_blocks.2.attentions.5.transformer_blocks.0.attn2", |
| ".unet.up_blocks.2.attentions.5.transformer_blocks.1.attn2", |
| ".unet.up_blocks.2.attentions.6.transformer_blocks.0.attn2", |
| ".unet.up_blocks.2.attentions.6.transformer_blocks.1.attn2", |
| ".unet.up_blocks.2.attentions.9.transformer_blocks.0.attn2", |
| ".unet.up_blocks.2.attentions.9.transformer_blocks.1.attn2", |
| ".unet.up_blocks.2.attentions.10.transformer_blocks.0.attn2", |
| ".unet.up_blocks.2.attentions.10.transformer_blocks.1.attn2" |
| ], |
| "normalize_sv": true |
| } |
| ``` |
|
|