File size: 3,420 Bytes
a1079d1 fe19082 a1079d1 fe19082 a1079d1 fe19082 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ---
title: PiedPiper
emoji: π΅
colorFrom: green
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit
---
# PiedPiper API
FastAPI service that takes an AI-generated music track, encodes it to a 512-d
LAION-CLAP music-tuned audio embedding (10s windowed, L2-normalized mean pool),
and returns the **top-K closest tracks in a 160-track reference catalog** ranked
by cosine similarity. A legacy `/analyze` endpoint preserved from the prior
quality-detector pipeline returns a 7-signal librosa-based brokenness report.
Two independent secondary signals from ACRCloud β **Cover Song ID** and
**AI Music Detector** β are exposed as additional `/neighbors` response fields
behind the `ENABLE_ACRCLOUD` flag.
This is the backend half of the project. The React frontend on Vercel calls this
service.
## Endpoints
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/health` | `{"ok": true, "model": "...", "corpus": <N>}` |
| `POST` | `/neighbors?k=3` | multipart `file=...` β top-K matches with `meanPooledSimilarity` + `maxSegmentSimilarity` + optional ACRCloud signals |
| `POST` | `/analyze` | multipart `file=...` β legacy 7-signal quality report |
## Configuration
| Env var | Default | Purpose |
|---|---|---|
| `CORS_ORIGIN` | `http://localhost:5173` | Frontend origin allowed in addition to the `*.vercel.app` regex. Set this to your Vercel production URL. |
| `PORT` | `7860` | Server bind port (HF Spaces provides this). |
| `HF_HOME` | `/app/.hf_cache` | Model cache location (set in the Dockerfile). |
| `CORPUS_DIR` | (auto) | Override the corpus directory. Defaults to `/app/quality-scorer/public/corpus`. |
| `SIMILARITY_THRESHOLD_DEFAULT` | `0.70` | Below this cosine, the frontend renders the "completely unique" empty state. |
| `ENABLE_ACRCLOUD` | `false` | Master gate for both ACRCloud signals. |
| `ACRCLOUD_ACCESS_KEY` | β | Cover Song ID HMAC access key. |
| `ACRCLOUD_ACCESS_SECRET` | β | Cover Song ID HMAC secret. |
| `ACRCLOUD_HOST` | `identify-eu-west-1.acrcloud.com` | Cover Song ID identification host. |
| `ACRCLOUD_AI_DETECTOR_URL` | β | AI Music Detector endpoint URL. |
| `ACRCLOUD_AI_DETECTOR_BEARER` | β | AI Music Detector bearer token. |
Set these via the Space's **Settings β Variables and secrets** tab. The
`ACRCLOUD_*` secrets must be marked as Secret (not Variable) so they are not
echoed in build logs.
## Catalog rights
The reference corpus blends two free-licensable tiers:
- **Tier 1** β iTunes Search API previews. Per Apple terms the preview audio is
streamed at request time, never cached locally. The catalog stores only metadata
+ the precomputed 512-d embedding. Each Tier-1 row carries `attributionRequired: true`
and an attribution `trackViewUrl` link-out.
- **Tier 2** β MTG-Jamendo (CC-BY) loaded from the public mirror, normalized to
the same embedding pipeline. Artist names are anonymized in metadata per the
Jamendo distribution conventions.
The eval framing (`/evaluation` page) names this catalog composition explicitly
as a known limitation.
## Cold start
Free CPU Basic Spaces sleep after ~48 h idle and take ~30 s to wake on the first
request. The PiedPiper frontend handles this with a "warming up the analyzer" UI
state when a request exceeds 6 s. An UptimeRobot ping on `/health` every 5 minutes
keeps the Space warm during the demo window β setup is documented in the top-level
repo README.
|