Papers
arxiv:2506.07520

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Published on Jun 9, 2025
Authors:
,
,
,
,
,
,

Abstract

LeVo, a framework combining an LM and a music codec, improves lyrics-to-song generation by parallelly modeling mixed and dual-track tokens, using transformer decoders, and employing direct preference optimization to enhance musicality and instruction following.

Recent advances in large language models (LLMs) and audio language models have significantly improved music generation, particularly in lyrics-to-song generation. However, existing approaches still struggle with the complex composition of songs and the scarcity of high-quality data, leading to limitations in sound quality, musicality, instruction following, and vocal-instrument harmony. To address these challenges, we introduce LeVo, an LM-based framework consisting of LeLM and a music codec. LeLM is capable of parallelly modeling two types of tokens: mixed tokens, which represent the combined audio of vocals and accompaniment to achieve vocal-instrument harmony, and dual-track tokens, which separately encode vocals and accompaniment for high-quality song generation. It employs two decoder-only transformers and a modular extension training strategy to prevent interference between different token types. To further enhance musicality and instruction following, we introduce a multi-preference alignment method based on Direct Preference Optimization (DPO). This method handles diverse human preferences through a semi-automatic data construction process and DPO post-training. Experimental results demonstrate that LeVo consistently outperforms existing methods on both objective and subjective metrics. Ablation studies further justify the effectiveness of our designs. Audio examples are available at https://levo-demo.github.io/.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2506.07520
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 15

Browse 15 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.07520 in a dataset README.md to link it from this page.

Spaces citing this paper 22

Browse 22 spaces citing this paper

Collections including this paper 1