audioldm2-caa-piano / README.md
lukasz-staniszewski's picture
Refresh card: AudioLDMCAASteeringController quickstart
94b25b2 verified
|
raw
history blame
5.52 kB
metadata
library_name: audio-interv
tags:
  - activation-steering
  - audio
  - audioldm
  - audioldm2
  - caa
  - diffusion
  - interpretability
  - music
  - piano
  - steering

CAA — piano (AudioLDM2)

Steering vectors for the piano concept on AudioLDM2, computed via contrastive activation addition (CAA).

Paper

TADA! Tuning Audio Diffusion Models through Activation Steering — https://huggingface.co/papers/2602.11910

Quickstart

from src.steering import SteerableAudioLDMModel, AudioLDMCAASteeringController

model = SteerableAudioLDMModel(device="cuda")
ctrl = AudioLDMCAASteeringController.from_pretrained("lukasz-staniszewski/audioldm2-caa-piano", alpha=1.0)

with model.steer(ctrl):
    out = model.generate(
        prompt="instrumental music",
        num_inference_steps=30, audio_length_in_s=10.0,
        guidance_scale=3.5, seed=0,
    )

Generation config

{
  "method": "standard_caa_audioldm",
  "model": "cvssp/audioldm2-large",
  "concept": "piano",
  "num_inference_steps": 100,
  "audio_length_in_s": 10.0,
  "guidance_scale": 4.5,
  "seed": 10,
  "device": "cuda",
  "dtype": "float16",
  "save_all_cfg_passes": true,
  "layers_preset": "all",
  "layers_to_steer": [
    ".unet.down_blocks.1.attentions.1.transformer_blocks.0.attn2",
    ".unet.down_blocks.1.attentions.1.transformer_blocks.1.attn2",
    ".unet.down_blocks.1.attentions.2.transformer_blocks.0.attn2",
    ".unet.down_blocks.1.attentions.2.transformer_blocks.1.attn2",
    ".unet.down_blocks.1.attentions.5.transformer_blocks.0.attn2",
    ".unet.down_blocks.1.attentions.5.transformer_blocks.1.attn2",
    ".unet.down_blocks.1.attentions.6.transformer_blocks.0.attn2",
    ".unet.down_blocks.1.attentions.6.transformer_blocks.1.attn2",
    ".unet.down_blocks.2.attentions.1.transformer_blocks.0.attn2",
    ".unet.down_blocks.2.attentions.1.transformer_blocks.1.attn2",
    ".unet.down_blocks.2.attentions.2.transformer_blocks.0.attn2",
    ".unet.down_blocks.2.attentions.2.transformer_blocks.1.attn2",
    ".unet.down_blocks.2.attentions.5.transformer_blocks.0.attn2",
    ".unet.down_blocks.2.attentions.5.transformer_blocks.1.attn2",
    ".unet.down_blocks.2.attentions.6.transformer_blocks.0.attn2",
    ".unet.down_blocks.2.attentions.6.transformer_blocks.1.attn2",
    ".unet.down_blocks.3.attentions.1.transformer_blocks.0.attn2",
    ".unet.down_blocks.3.attentions.1.transformer_blocks.1.attn2",
    ".unet.down_blocks.3.attentions.2.transformer_blocks.0.attn2",
    ".unet.down_blocks.3.attentions.2.transformer_blocks.1.attn2",
    ".unet.down_blocks.3.attentions.5.transformer_blocks.0.attn2",
    ".unet.down_blocks.3.attentions.5.transformer_blocks.1.attn2",
    ".unet.down_blocks.3.attentions.6.transformer_blocks.0.attn2",
    ".unet.down_blocks.3.attentions.6.transformer_blocks.1.attn2",
    ".unet.mid_block.attentions.1.transformer_blocks.0.attn2",
    ".unet.mid_block.attentions.1.transformer_blocks.1.attn2",
    ".unet.mid_block.attentions.2.transformer_blocks.0.attn2",
    ".unet.mid_block.attentions.2.transformer_blocks.1.attn2",
    ".unet.up_blocks.0.attentions.1.transformer_blocks.0.attn2",
    ".unet.up_blocks.0.attentions.1.transformer_blocks.1.attn2",
    ".unet.up_blocks.0.attentions.2.transformer_blocks.0.attn2",
    ".unet.up_blocks.0.attentions.2.transformer_blocks.1.attn2",
    ".unet.up_blocks.0.attentions.5.transformer_blocks.0.attn2",
    ".unet.up_blocks.0.attentions.5.transformer_blocks.1.attn2",
    ".unet.up_blocks.0.attentions.6.transformer_blocks.0.attn2",
    ".unet.up_blocks.0.attentions.6.transformer_blocks.1.attn2",
    ".unet.up_blocks.0.attentions.9.transformer_blocks.0.attn2",
    ".unet.up_blocks.0.attentions.9.transformer_blocks.1.attn2",
    ".unet.up_blocks.0.attentions.10.transformer_blocks.0.attn2",
    ".unet.up_blocks.0.attentions.10.transformer_blocks.1.attn2",
    ".unet.up_blocks.1.attentions.1.transformer_blocks.0.attn2",
    ".unet.up_blocks.1.attentions.1.transformer_blocks.1.attn2",
    ".unet.up_blocks.1.attentions.2.transformer_blocks.0.attn2",
    ".unet.up_blocks.1.attentions.2.transformer_blocks.1.attn2",
    ".unet.up_blocks.1.attentions.5.transformer_blocks.0.attn2",
    ".unet.up_blocks.1.attentions.5.transformer_blocks.1.attn2",
    ".unet.up_blocks.1.attentions.6.transformer_blocks.0.attn2",
    ".unet.up_blocks.1.attentions.6.transformer_blocks.1.attn2",
    ".unet.up_blocks.1.attentions.9.transformer_blocks.0.attn2",
    ".unet.up_blocks.1.attentions.9.transformer_blocks.1.attn2",
    ".unet.up_blocks.1.attentions.10.transformer_blocks.0.attn2",
    ".unet.up_blocks.1.attentions.10.transformer_blocks.1.attn2",
    ".unet.up_blocks.2.attentions.1.transformer_blocks.0.attn2",
    ".unet.up_blocks.2.attentions.1.transformer_blocks.1.attn2",
    ".unet.up_blocks.2.attentions.2.transformer_blocks.0.attn2",
    ".unet.up_blocks.2.attentions.2.transformer_blocks.1.attn2",
    ".unet.up_blocks.2.attentions.5.transformer_blocks.0.attn2",
    ".unet.up_blocks.2.attentions.5.transformer_blocks.1.attn2",
    ".unet.up_blocks.2.attentions.6.transformer_blocks.0.attn2",
    ".unet.up_blocks.2.attentions.6.transformer_blocks.1.attn2",
    ".unet.up_blocks.2.attentions.9.transformer_blocks.0.attn2",
    ".unet.up_blocks.2.attentions.9.transformer_blocks.1.attn2",
    ".unet.up_blocks.2.attentions.10.transformer_blocks.0.attn2",
    ".unet.up_blocks.2.attentions.10.transformer_blocks.1.attn2"
  ],
  "normalize_sv": true
}