Spaces:
Paused
Paused
Andrew commited on
Commit ·
59521c0
1
Parent(s): 7376794
Add lyric-conditioning findings and dataset guidance
Browse files- summaries/findings.md +17 -1
summaries/findings.md
CHANGED
|
@@ -191,13 +191,29 @@ Even after cleanup, these are still weak points in current sidecars:
|
|
| 191 |
|
| 192 |
These are optional for training, but adding reliable BPM/key would likely improve control and consistency.
|
| 193 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 194 |
## My current recommendation
|
| 195 |
|
| 196 |
1. Keep NVIDIA stack as default for annotation generation quality.
|
| 197 |
2. Keep core LoRA fields simple and valid.
|
| 198 |
3. Keep rich details in `source.rich_details` for traceability.
|
| 199 |
4. Keep detail-rich caption text for actual conditioning.
|
| 200 |
-
5.
|
|
|
|
| 201 |
|
| 202 |
## Next technical step I want
|
| 203 |
|
|
|
|
| 191 |
|
| 192 |
These are optional for training, but adding reliable BPM/key would likely improve control and consistency.
|
| 193 |
|
| 194 |
+
## Lyric conditioning finding (important)
|
| 195 |
+
|
| 196 |
+
I needed to document what I observed around lyrics and artist similarity:
|
| 197 |
+
|
| 198 |
+
- In my tests, generations could still sound fairly close to Andrew Spacey when I prompted with lyrics from songs in my catalog, even when lyric fields in training data were incomplete.
|
| 199 |
+
- My interpretation is that the model + LoRA already learned part of the vocal timbre, cadence, and arrangement style from audio/caption conditioning, so familiar lyric patterns still anchored output.
|
| 200 |
+
- But this was inconsistent when I switched to fully new lyrics. Similarity dropped more often, especially in phrasing and hook behavior.
|
| 201 |
+
|
| 202 |
+
What this means for my dataset strategy:
|
| 203 |
+
|
| 204 |
+
- I should include lyrics in sidecar JSON for vocal tracks whenever possible.
|
| 205 |
+
- I should keep lyric formatting consistent (`[Verse]`, `[Chorus]`, etc.) so the text-conditioning path is stable.
|
| 206 |
+
- For instrumentals, I should use a consistent marker such as `[Instrumental]`.
|
| 207 |
+
- Captions still carry major weight, but caption + lyrics together give me the best chance at stable artist-like generations with new text.
|
| 208 |
+
|
| 209 |
## My current recommendation
|
| 210 |
|
| 211 |
1. Keep NVIDIA stack as default for annotation generation quality.
|
| 212 |
2. Keep core LoRA fields simple and valid.
|
| 213 |
3. Keep rich details in `source.rich_details` for traceability.
|
| 214 |
4. Keep detail-rich caption text for actual conditioning.
|
| 215 |
+
5. Fill lyric fields for all vocal songs and keep lyric structure consistent.
|
| 216 |
+
6. Add a BPM/key estimation pass next if I want stronger metadata conditioning.
|
| 217 |
|
| 218 |
## Next technical step I want
|
| 219 |
|