Andrew commited on
Commit
59521c0
·
1 Parent(s): 7376794

Add lyric-conditioning findings and dataset guidance

Browse files
Files changed (1) hide show
  1. summaries/findings.md +17 -1
summaries/findings.md CHANGED
@@ -191,13 +191,29 @@ Even after cleanup, these are still weak points in current sidecars:
191
 
192
  These are optional for training, but adding reliable BPM/key would likely improve control and consistency.
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ## My current recommendation
195
 
196
  1. Keep NVIDIA stack as default for annotation generation quality.
197
  2. Keep core LoRA fields simple and valid.
198
  3. Keep rich details in `source.rich_details` for traceability.
199
  4. Keep detail-rich caption text for actual conditioning.
200
- 5. Add a BPM/key estimation pass next if I want stronger metadata conditioning.
 
201
 
202
  ## Next technical step I want
203
 
 
191
 
192
  These are optional for training, but adding reliable BPM/key would likely improve control and consistency.
193
 
194
+ ## Lyric conditioning finding (important)
195
+
196
+ I needed to document what I observed around lyrics and artist similarity:
197
+
198
+ - In my tests, generations could still sound fairly close to Andrew Spacey when I prompted with lyrics from songs in my catalog, even when lyric fields in training data were incomplete.
199
+ - My interpretation is that the model + LoRA already learned part of the vocal timbre, cadence, and arrangement style from audio/caption conditioning, so familiar lyric patterns still anchored output.
200
+ - But this was inconsistent when I switched to fully new lyrics. Similarity dropped more often, especially in phrasing and hook behavior.
201
+
202
+ What this means for my dataset strategy:
203
+
204
+ - I should include lyrics in sidecar JSON for vocal tracks whenever possible.
205
+ - I should keep lyric formatting consistent (`[Verse]`, `[Chorus]`, etc.) so the text-conditioning path is stable.
206
+ - For instrumentals, I should use a consistent marker such as `[Instrumental]`.
207
+ - Captions still carry major weight, but caption + lyrics together give me the best chance at stable artist-like generations with new text.
208
+
209
  ## My current recommendation
210
 
211
  1. Keep NVIDIA stack as default for annotation generation quality.
212
  2. Keep core LoRA fields simple and valid.
213
  3. Keep rich details in `source.rich_details` for traceability.
214
  4. Keep detail-rich caption text for actual conditioning.
215
+ 5. Fill lyric fields for all vocal songs and keep lyric structure consistent.
216
+ 6. Add a BPM/key estimation pass next if I want stronger metadata conditioning.
217
 
218
  ## Next technical step I want
219