Meta unveils AI to create tunes from text description called MusicGen

Meta unveils AI to create tunes from text description called MusicGen

The demonstration of this technology on Facebook's Hugging Face AI website provides users with the opportunity to describe their desired music style using a variety of examples

Advertisement
Meta CEO Mark ZuckerbergMeta CEO Mark Zuckerberg
https://akm-img-a-in.tosshub.com/businesstoday/2023-01/fb_img_1675105576310-01.jpeg
Pranav Dixit
  • Jun 13, 2023,
  • Updated Jun 13, 2023 3:39 PM IST

Meta's Audiocraft research team has recently unveiled MusicGen, an innovative open-source deep-learning language model designed to generate original music based on text prompts and even synchronise with existing songs. Similar to ChatGPT for audio, MusicGen allows users to describe their desired music style, optionally incorporate an existing melody, and click "Generate" to witness the magic unfold. After a reasonable processing time (a little under 3 minutes in my experience), MusicGen produces a unique and concise musical piece based on the provided text prompts and melody.

Advertisement

The demonstration of this technology on Facebook's Hugging Face AI website provides users with the opportunity to describe their desired music style using a variety of examples. 

For instance, one might request "an 80s driving pop song with heavy drums and synth pads in the background." Users can then "condition" MusicGen using a selected song snippet, up to 30 seconds in length, with controls that allow them to specify a particular portion of the track. With a simple click of the "Generate" button, MusicGen renders a high-quality musical sample lasting up to 12 seconds.

To train the MusicGen model, the research team utilised an impressive 20,000 hours of licensed music. This comprehensive dataset consisted of 10,000 top-notch music tracks from an internal collection, in addition to tracks from well-known sources like Shutterstock and Pond5. 

Advertisement

In order to optimise performance, the team employed Meta's 32KHz EnCodec audio tokeniser, which facilitated the generation of smaller music segments that could be processed simultaneously. Hugging Face ML Engineer Ahsen Khaliq mentioned in a tweet that MusicGen stands out from existing methods such as MusicLM, as it doesn't require a self-supervised semantic representation and only requires 50 auto-regressive steps per second of audio.

Although Google introduced a similar music generator called MusicLM last month, MusicGen appears to yield slightly superior results. In a sample comparison conducted by the researchers, MusicGen's output was pitted against that of MusicLM, as well as two other models named Riffusion and Musai, reinforcing its remarkable performance. MusicGen can be run locally (a GPU with a minimum of 16GB RAM is recommended) and is available in four different model sizes, ranging from small (300 million parameters) to large (3.3 billion parameters), with the latter offering the highest potential for generating intricate and complex music compositions.

Advertisement

Deep learning models, like MusicGen, continue to make significant advancements, encroaching upon yet another creative domain: music composition.

Also Read 

'Buying Netflix at $4 billion would've been better instead of...': Former Yahoo CEO Marissa Mayer

ChatGPT beats top investment funds in stock-picking experiment

For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine

Meta's Audiocraft research team has recently unveiled MusicGen, an innovative open-source deep-learning language model designed to generate original music based on text prompts and even synchronise with existing songs. Similar to ChatGPT for audio, MusicGen allows users to describe their desired music style, optionally incorporate an existing melody, and click "Generate" to witness the magic unfold. After a reasonable processing time (a little under 3 minutes in my experience), MusicGen produces a unique and concise musical piece based on the provided text prompts and melody.

Advertisement

The demonstration of this technology on Facebook's Hugging Face AI website provides users with the opportunity to describe their desired music style using a variety of examples. 

For instance, one might request "an 80s driving pop song with heavy drums and synth pads in the background." Users can then "condition" MusicGen using a selected song snippet, up to 30 seconds in length, with controls that allow them to specify a particular portion of the track. With a simple click of the "Generate" button, MusicGen renders a high-quality musical sample lasting up to 12 seconds.

To train the MusicGen model, the research team utilised an impressive 20,000 hours of licensed music. This comprehensive dataset consisted of 10,000 top-notch music tracks from an internal collection, in addition to tracks from well-known sources like Shutterstock and Pond5. 

Advertisement

In order to optimise performance, the team employed Meta's 32KHz EnCodec audio tokeniser, which facilitated the generation of smaller music segments that could be processed simultaneously. Hugging Face ML Engineer Ahsen Khaliq mentioned in a tweet that MusicGen stands out from existing methods such as MusicLM, as it doesn't require a self-supervised semantic representation and only requires 50 auto-regressive steps per second of audio.

Although Google introduced a similar music generator called MusicLM last month, MusicGen appears to yield slightly superior results. In a sample comparison conducted by the researchers, MusicGen's output was pitted against that of MusicLM, as well as two other models named Riffusion and Musai, reinforcing its remarkable performance. MusicGen can be run locally (a GPU with a minimum of 16GB RAM is recommended) and is available in four different model sizes, ranging from small (300 million parameters) to large (3.3 billion parameters), with the latter offering the highest potential for generating intricate and complex music compositions.

Advertisement

Deep learning models, like MusicGen, continue to make significant advancements, encroaching upon yet another creative domain: music composition.

Also Read 

'Buying Netflix at $4 billion would've been better instead of...': Former Yahoo CEO Marissa Mayer

ChatGPT beats top investment funds in stock-picking experiment

For Unparalleled coverage of India's Businesses and Economy – Subscribe to Business Today Magazine

Read more!
Advertisement