Spaces:
Running on Zero
Project Guidelines — ACE Music Studio
Working notes for AI assistants editing this repo. This file is the what & why — the locked architecture, the gotchas, the sole-author rule. Companion to SKILLS.md (the how — process, debugging, deployment workflow) and AGENTS.md (tool-agnostic version of this file).
⚠ Sole-author rule (non-negotiable)
Mayank Gupta is the sole author on every commit in this repo. No exceptions.
When committing:
- NO
Co-Authored-By: Claude…(or any agent name) trailer. - NO "Generated with Claude Code" / "🤖 Generated with…" footers.
- NO
--author=…flag — let git use the user's configured identity. - NO attribution in PR descriptions.
If asked to amend, re-commit, or rebase, strip any prior agent attribution from the commit message. Treat any tooling that suggests adding a Claude trailer as a bug to ignore.
Architecture facts (locked — do not relitigate)
Spec: docs/superpowers/specs/2026-05-18-ace-music-studio-design.md
Plan: docs/superpowers/plans/2026-05-18-ace-music-studio.md
- Backend is ACE-Step 1.5 XL SFT — not ComfyUI. Installed from git (the package isn't on PyPI). The upstream repo is
git+https://github.com/ace-step/ACE-Step-1.5.git; the Apple Silicon fork isgit+https://github.com/clockworksquirrel/ace-step-apple-silicon.git. - Five tabs. Generate, Cover, Extend, Edit, Lyrics. Progressive disclosure — defaults stay short and reveal advanced controls only when asked.
- One pipeline instance. Single ACE-Step pipeline; mode handlers (generate / cover / extend / edit) call different pipeline entry points. No re-instantiation between calls.
@spaces.GPUis applied at module load time. Identity decorator off Spaces. The decorator'sduration=parameter takes a callable that estimates per-call timeout from(mode, params, multiplier). Estimator clamps at[60, 300] s.- Qwen 2.5 7B handles lyrics generation. Text-only inference; full multimodal weights are NOT required. On Mac the MLX path is used via mlx-lm.
- HF cache →
./models/<repo>/symlink. ACE-Step looks for files atlocal_model_path/....app._bootstrap()symlinks every cached snapshot into./models/<org>/<repo>/so the preload weights are findable. On Spaces, the build-user-owned~/.cache/huggingface/hubis mirrored to runtime-writable~/hf-cache-rw/first, then symlinked. - One Gradio process. Lazy backend singleton.
get_backend()constructs the pipeline on the first request (~30–60 s warm-up). Module import is fast.
Gotchas we already paid for (don't re-discover)
Each of these cost a debug cycle. Read once.
MPS / Apple Silicon
torch.mpshas nomem_get_info. Any VRAM-gate that calls that method raises AttributeError. Fix:vram_limit_for("mps")returnsNoneso the gate short-circuits.- Several ops aren't implemented on the MPS backend (SDPA variants, some index ops).
app.pysetsPYTORCH_ENABLE_MPS_FALLBACK=1so they degrade to CPU instead of crashing.
ACE-Step gotchas
TBD as discovered during M1+ implementation. Record new ones here as they come up.
Dependency footguns
ace-stepis NOT on PyPI. Install from git (seerequirements.txt).- Don't pin
spacesinrequirements.txt. HF Spaces' ZeroGPU build injects its own version. A pin causes pip-resolve failure. transformers >= 5may break imports. Pin:transformers>=4.45,<5.0.
Gradio 6.14 quirks
- Running version is
gradio>=6.14,<7.requirements.txtreflects this; HF Spacessdk_version: 6.14.0matches. - Don't put
<script>tags insidegr.HTMLblocks — they get stripped. JS goes ingr.Blocks(head=…). info=is not accepted bygr.Audioorgr.Fileon 6.14.tooltips.pykeeps the strings forCOVER_REF_AUDIO,EXTEND_SEED_AUDIO,EDIT_SOURCE_AUDIO,LORA_UPLOADas the single source of truth — when upstream landsinfo=on those components, they're a one-line wire-up away.- Slate-blue band around primary CTA: defeated via
.styler { background: transparent }intheme.CSS. If a future Gradio bump reintroduces it, the override needs revisiting.
HF Spaces deployment
preload_from_hubis build-time only. Runtime falls back to network if any required file isn't preloaded. Use broad globs so configs + index.json files come along.- ZeroGPU build injects
spaces==0.50.0. Ifrequirements.txtpinsspaces==0.30.0, pip resolution fails. Don't pinspacesat all — let HF provide it. - The
@spaces.GPUdecorator must be applied at module load. Runtime decoration isn't detected by ZeroGPU's startup analyzer.
Coding conventions
- Python 3.11. HF Spaces base image is 3.11; older syntax (like no
match) is fine. - Flat top-level layout. No
src/, no nested packages. One.pyper responsibility. - No conda.
python3.11 -m venv .venv;brewfor system binaries. - No emojis in code or commits unless explicitly requested. UI strings (CTA banner, button labels) are OK because they're user-facing copy, not code.
- Type hints on public functions. Internal helpers can skip them when obvious.
- Imports at the top of the file. Inline imports only to break circular deps OR to defer heavy modules (ace-step, torch, mlx) for fast CI startup.
ruff format+ruff checkboth pass in CI. No exceptions.
Commits
- Conventional Commits:
<type>(<scope>): <subject>— types:feat,fix,chore,docs,test,refactor,ci,perf. - Subject is imperative, lowercase, no trailing period.
- Body explains why when not obvious. Reference the spec / plan section if relevant.
- Frequent small commits — one logical change per commit.
- NO Claude trailer. See top of file.
Testing
- TDD per the plan. Each implementation task has the failing test first.
- L1 + L2 in CI (no GPU): module structure, mocked pipeline call boundaries, ruff.
tests/test_smoke_gpu.pyis the GPU smoke; it's marked with@pytest.mark.gpuand skipped by default (pyprojectaddopts = -m 'not gpu'). - No mocks for ACE-Step internals. Mock only the
pipe(...)call boundary so the mode-handler logic is verified at the boundary. - Use
pytest -m gputo opt into the GPU smoke (~32 GB download on a cold cache; runs full generate + cover + extend + edit).
Out of scope for v1 (don't add without asking)
Per spec §13:
- Multi-prompt batch queue
- Persistent generation history
- User accounts
- Telemetry dashboard
- Voice cloning (RVC)
- LoRA training in-app
- ControlNet-style conditioning
- Spectrogram visualization
- Multi-language UI strings
- Watermarking output audio
- Browser audio editing
- Multi-tenant rate limiting
- DAW export
If a task feels like it needs one of these, stop and ask the user.
When in doubt
- Read the spec + plan. Fifteen minutes of reading vs a day of wrong implementation.
- Read
SKILLS.mdfor the process side — debugging, deployment, when to commit, when to verify. git log --oneline— most non-obvious decisions have a fix-commit explaining the reasoning.- Ask the user before changing architectural shape or adding scope outside the v1 list.