--- library_name: audio-interv tags: - activation-steering - audio - audioldm - audioldm2 - caa - diffusion - interpretability - music - steering - vocal-gender --- # CAA — `vocal_gender` (AudioLDM2) Steering vectors for the **vocal_gender** concept on AudioLDM2, computed via contrastive activation addition (CAA). ## Paper TADA! Tuning Audio Diffusion Models through Activation Steering — [https://huggingface.co/papers/2602.11910](https://huggingface.co/papers/2602.11910) ## Quickstart ```python from src.steering import SteerableAudioLDMModel, AudioLDMCAASteeringController model = SteerableAudioLDMModel(device="cuda") ctrl = AudioLDMCAASteeringController.from_pretrained("lukasz-staniszewski/audioldm2-caa-vocal-gender", alpha=1.0) with model.steer(ctrl): out = model.generate( prompt="instrumental music", num_inference_steps=30, audio_length_in_s=10.0, guidance_scale=3.5, seed=0, ) ``` ## Generation config ```json { "method": "standard_caa_audioldm", "model": "cvssp/audioldm2-large", "concept": "vocal_gender", "num_inference_steps": 100, "audio_length_in_s": 10.0, "guidance_scale": 4.5, "seed": 10, "device": "cuda", "dtype": "float16", "save_all_cfg_passes": true, "layers_preset": "all", "layers_to_steer": [ ".unet.down_blocks.1.attentions.1.transformer_blocks.0.attn2", ".unet.down_blocks.1.attentions.1.transformer_blocks.1.attn2", ".unet.down_blocks.1.attentions.2.transformer_blocks.0.attn2", ".unet.down_blocks.1.attentions.2.transformer_blocks.1.attn2", ".unet.down_blocks.1.attentions.5.transformer_blocks.0.attn2", ".unet.down_blocks.1.attentions.5.transformer_blocks.1.attn2", ".unet.down_blocks.1.attentions.6.transformer_blocks.0.attn2", ".unet.down_blocks.1.attentions.6.transformer_blocks.1.attn2", ".unet.down_blocks.2.attentions.1.transformer_blocks.0.attn2", ".unet.down_blocks.2.attentions.1.transformer_blocks.1.attn2", ".unet.down_blocks.2.attentions.2.transformer_blocks.0.attn2", ".unet.down_blocks.2.attentions.2.transformer_blocks.1.attn2", ".unet.down_blocks.2.attentions.5.transformer_blocks.0.attn2", ".unet.down_blocks.2.attentions.5.transformer_blocks.1.attn2", ".unet.down_blocks.2.attentions.6.transformer_blocks.0.attn2", ".unet.down_blocks.2.attentions.6.transformer_blocks.1.attn2", ".unet.down_blocks.3.attentions.1.transformer_blocks.0.attn2", ".unet.down_blocks.3.attentions.1.transformer_blocks.1.attn2", ".unet.down_blocks.3.attentions.2.transformer_blocks.0.attn2", ".unet.down_blocks.3.attentions.2.transformer_blocks.1.attn2", ".unet.down_blocks.3.attentions.5.transformer_blocks.0.attn2", ".unet.down_blocks.3.attentions.5.transformer_blocks.1.attn2", ".unet.down_blocks.3.attentions.6.transformer_blocks.0.attn2", ".unet.down_blocks.3.attentions.6.transformer_blocks.1.attn2", ".unet.mid_block.attentions.1.transformer_blocks.0.attn2", ".unet.mid_block.attentions.1.transformer_blocks.1.attn2", ".unet.mid_block.attentions.2.transformer_blocks.0.attn2", ".unet.mid_block.attentions.2.transformer_blocks.1.attn2", ".unet.up_blocks.0.attentions.1.transformer_blocks.0.attn2", ".unet.up_blocks.0.attentions.1.transformer_blocks.1.attn2", ".unet.up_blocks.0.attentions.2.transformer_blocks.0.attn2", ".unet.up_blocks.0.attentions.2.transformer_blocks.1.attn2", ".unet.up_blocks.0.attentions.5.transformer_blocks.0.attn2", ".unet.up_blocks.0.attentions.5.transformer_blocks.1.attn2", ".unet.up_blocks.0.attentions.6.transformer_blocks.0.attn2", ".unet.up_blocks.0.attentions.6.transformer_blocks.1.attn2", ".unet.up_blocks.0.attentions.9.transformer_blocks.0.attn2", ".unet.up_blocks.0.attentions.9.transformer_blocks.1.attn2", ".unet.up_blocks.0.attentions.10.transformer_blocks.0.attn2", ".unet.up_blocks.0.attentions.10.transformer_blocks.1.attn2", ".unet.up_blocks.1.attentions.1.transformer_blocks.0.attn2", ".unet.up_blocks.1.attentions.1.transformer_blocks.1.attn2", ".unet.up_blocks.1.attentions.2.transformer_blocks.0.attn2", ".unet.up_blocks.1.attentions.2.transformer_blocks.1.attn2", ".unet.up_blocks.1.attentions.5.transformer_blocks.0.attn2", ".unet.up_blocks.1.attentions.5.transformer_blocks.1.attn2", ".unet.up_blocks.1.attentions.6.transformer_blocks.0.attn2", ".unet.up_blocks.1.attentions.6.transformer_blocks.1.attn2", ".unet.up_blocks.1.attentions.9.transformer_blocks.0.attn2", ".unet.up_blocks.1.attentions.9.transformer_blocks.1.attn2", ".unet.up_blocks.1.attentions.10.transformer_blocks.0.attn2", ".unet.up_blocks.1.attentions.10.transformer_blocks.1.attn2", ".unet.up_blocks.2.attentions.1.transformer_blocks.0.attn2", ".unet.up_blocks.2.attentions.1.transformer_blocks.1.attn2", ".unet.up_blocks.2.attentions.2.transformer_blocks.0.attn2", ".unet.up_blocks.2.attentions.2.transformer_blocks.1.attn2", ".unet.up_blocks.2.attentions.5.transformer_blocks.0.attn2", ".unet.up_blocks.2.attentions.5.transformer_blocks.1.attn2", ".unet.up_blocks.2.attentions.6.transformer_blocks.0.attn2", ".unet.up_blocks.2.attentions.6.transformer_blocks.1.attn2", ".unet.up_blocks.2.attentions.9.transformer_blocks.0.attn2", ".unet.up_blocks.2.attentions.9.transformer_blocks.1.attn2", ".unet.up_blocks.2.attentions.10.transformer_blocks.0.attn2", ".unet.up_blocks.2.attentions.10.transformer_blocks.1.attn2" ], "normalize_sv": true } ```