Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model ⋆

Tornike Karchkhadze; Mohammad Rasool Izadi; Ke Chen; Shlomo Dubnov; Gérard Assayag

Communication Dans Un Congrès Année : 2024

Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model ⋆

(1) , (2) , (1) , (1) , (3)

1
2
3

Tornike Karchkhadze

Fonction : Auteur
PersonId : 1420378

University of California [San Diego]

Mohammad Rasool Izadi

Fonction : Auteur

Bose Corporation

Ke Chen

Fonction : Auteur
PersonId : 1420379

University of California [San Diego]

Shlomo Dubnov

Fonction : Auteur
PersonId : 897091
ORCID : 0000-0003-0222-1125

University of California [San Diego]

Gérard Assayag

Fonction : Auteur
PersonId : 1501
IdHAL : gerard-assayag
ORCID : 0000-0002-4427-7373
IdRef : 069359326

Représentations musicales

Résumé

Diffusion models have shown promising results in cross-modal generation tasks involving audio and music, such as text-to-sound and text-to-music generation. These text-controlled music generation models typically focus on generating music by capturing global musical attributes like genre and mood. However, music composition is a complex, multilayered task that often involves musical arrangement as an integral part of the process. This process involves composing each instrument to align with existing ones in terms of beat, dynamics, harmony, and melody, requiring greater precision and control over tracks than text prompts usually provide. In this work, we address these challenges by extending the MusicLDM-a latent diffusion model for music-into a multi-track generative model. By learning the joint probability of tracks sharing a context, our model is capable of generating music across several tracks that correspond well to each other, either conditionally or unconditionally. Additionally, our model is capable of arrangement generation, where the model can generate any subset of tracks given the others (e.g., generating a piano track complementing given bass and drum tracks). We compared our model with existing multi-track generative model and demonstrated that our model achieves considerable improvements across objective metrics, for both total and arrangement generation tasks. Sound examples can be found at https://mtmusicldm.github.

Mots clés

Diffusion model Multi track Arrangement generation Music generation Diffusion model Multi track Arrangement generation Music generation

Domaines

Multimédia [cs.MM] Apprentissage [cs.LG]

Fichier sous embargo

0	―	0	―	10
Année		Mois		Jours

Avant la publication
mardi 15 octobre 2024

Tornike Karchkhadze : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04715297

Soumis le : lundi 30 septembre 2024-18:31:45

Dernière modification le : samedi 5 octobre 2024-17:45:12

Dates et versions

hal-04715297 , version 1 (30-09-2024)

Identifiants

HAL Id : hal-04715297 , version 1

Citer

Tornike Karchkhadze, Mohammad Rasool Izadi, Ke Chen, Shlomo Dubnov, Gérard Assayag. Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model ⋆. EAI ArtsIT 2024, Nov 2024, Abu DHABI, United Arab Emirates. ⟨hal-04715297⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IRCAM

0 Consultations

0 Téléchargements

Multi-Track MusicLDM: Towards Versatile Music Generation with Latent Diffusion Model ⋆

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager