PiedPiper / README.md
RajatA98's picture
Initial PiedPiper Space build - Dockerfile + backend + 160-track catalog
fe19082 verified
|
Raw
History Blame
3.42 kB
metadata
title: PiedPiper
emoji: 🎡
colorFrom: green
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit

PiedPiper API

FastAPI service that takes an AI-generated music track, encodes it to a 512-d LAION-CLAP music-tuned audio embedding (10s windowed, L2-normalized mean pool), and returns the top-K closest tracks in a 160-track reference catalog ranked by cosine similarity. A legacy /analyze endpoint preserved from the prior quality-detector pipeline returns a 7-signal librosa-based brokenness report.

Two independent secondary signals from ACRCloud β€” Cover Song ID and AI Music Detector β€” are exposed as additional /neighbors response fields behind the ENABLE_ACRCLOUD flag.

This is the backend half of the project. The React frontend on Vercel calls this service.

Endpoints

Method Path Purpose
GET /health {"ok": true, "model": "...", "corpus": <N>}
POST /neighbors?k=3 multipart file=... β†’ top-K matches with meanPooledSimilarity + maxSegmentSimilarity + optional ACRCloud signals
POST /analyze multipart file=... β†’ legacy 7-signal quality report

Configuration

Env var Default Purpose
CORS_ORIGIN http://localhost:5173 Frontend origin allowed in addition to the *.vercel.app regex. Set this to your Vercel production URL.
PORT 7860 Server bind port (HF Spaces provides this).
HF_HOME /app/.hf_cache Model cache location (set in the Dockerfile).
CORPUS_DIR (auto) Override the corpus directory. Defaults to /app/quality-scorer/public/corpus.
SIMILARITY_THRESHOLD_DEFAULT 0.70 Below this cosine, the frontend renders the "completely unique" empty state.
ENABLE_ACRCLOUD false Master gate for both ACRCloud signals.
ACRCLOUD_ACCESS_KEY β€” Cover Song ID HMAC access key.
ACRCLOUD_ACCESS_SECRET β€” Cover Song ID HMAC secret.
ACRCLOUD_HOST identify-eu-west-1.acrcloud.com Cover Song ID identification host.
ACRCLOUD_AI_DETECTOR_URL β€” AI Music Detector endpoint URL.
ACRCLOUD_AI_DETECTOR_BEARER β€” AI Music Detector bearer token.

Set these via the Space's Settings β†’ Variables and secrets tab. The ACRCLOUD_* secrets must be marked as Secret (not Variable) so they are not echoed in build logs.

Catalog rights

The reference corpus blends two free-licensable tiers:

  • Tier 1 β€” iTunes Search API previews. Per Apple terms the preview audio is streamed at request time, never cached locally. The catalog stores only metadata
    • the precomputed 512-d embedding. Each Tier-1 row carries attributionRequired: true and an attribution trackViewUrl link-out.
  • Tier 2 β€” MTG-Jamendo (CC-BY) loaded from the public mirror, normalized to the same embedding pipeline. Artist names are anonymized in metadata per the Jamendo distribution conventions.

The eval framing (/evaluation page) names this catalog composition explicitly as a known limitation.

Cold start

Free CPU Basic Spaces sleep after ~48 h idle and take ~30 s to wake on the first request. The PiedPiper frontend handles this with a "warming up the analyzer" UI state when a request exceeds 6 s. An UptimeRobot ping on /health every 5 minutes keeps the Space warm during the demo window β€” setup is documented in the top-level repo README.