Google built an AI that can generate music from text descriptions, but it won’t release it • TechCrunch
An impressive new AI system from Google can generate music in any genre with a text description. But the company, fearing the risks, has no immediate plans to launch it.
I call musicLM, Google is certainly not the first generative AI system for songs. There have been other attempts, including refusionan AI that composes music by visualizing it, as well as Dance DiffusionGoogle AudioML and OpenAI Jukebox. But due to technical limitations and limited training data, none have been able to produce songs that are particularly complex in composition or high fidelity.
MusicLM is perhaps the first that can do it.
Detailed in an academic paperMusicLM was trained on a dataset of 280,000 hours of music to learn how to generate coherent songs for descriptions of, as the creators put it, “significant complexity” (for example, “lovely jazz song with a memorable saxophone solo and a solo singer” or “90s Berlin techno with a low bass and a heavy kick.” His songs, surprisingly, sound as if a human artist could compose, though not necessarily as inventive or musically cohesive.
It’s hard to overstate how well the samples play, since there are no musicians or instrumentalists in the loop. Even when fed somewhat long and meandering descriptions, MusicLM manages to capture nuances like instrumental riffs, melodies, and moods.
The sample caption below, for example, included the “induces the experience of being lost in space” bit, and it certainly delivers on that front (at least to my ears):
Here’s another sample, generated from a description that begins with the sentence “The main soundtrack of an arcade game.” Plausible, right?
MusicLM’s capabilities extend beyond generating short song clips. The Google researchers show that the system can be based on existing melodies, whether hummed, sung, whistled or played on an instrument. In addition, MusicLM can take various written descriptions in sequence (for example, “time to meditate”, “time to wake up”, “time to run”, “time to give 100%”) and create a kind of “story” or melodic narrative. several minutes long, perfectly suited for a movie soundtrack.
See below, which comes from the sequence “electronic song playing in a video game”, “meditation song playing by a river”, “fire”, “fireworks”.
That’s not to say that MusicLM is perfect, far from it, honestly. Some of the samples have a distorted quality, an unavoidable side effect of the training process. And while MusicLM can technically generate vocals, including choral harmonies, they leave a lot to be desired. Most of the “lyrics” range from barely English to pure gibberish, sung by synth vocals that sound like fusions of various artists.
Still, Google researchers note the many ethical challenges a system like MusicLM poses, including a tendency to incorporate copyrighted material from training data into the generated songs. During an experiment, they found that around 1% of the music the system generated was played directly from the songs it trained on, a threshold apparently high enough to discourage them from launching MusicLM in its current state.
“We acknowledge the risk of potential misappropriation of creative content associated with the use case,” the paper’s co-authors wrote. “We stress the need for more future work to address these risks associated with the music generation.”
Assuming that MusicLM or a similar system becomes available one day, major legal issues seem inevitable. They already have, albeit around simpler AI systems. In 2020, Jay-Z’s record label filed copyright action against a YouTube channel, Vocal Synthesis, for using AI to create Jay-Z covers of songs like Billy’s “We Didn’t Start the Fire.” Joel. After initially removing the videos, YouTube reinstated them and found that the removal requests were “incomplete.” But counterfeit music still stands on murky legal ground.
A White paper Authored by Eric Sunray, now a legal intern at the Music Publishers Association, he argues that AI music generators like MusicLM violate music copyrights by creating “coherent audio tapestries of the works they ingest in training, thereby infringing the reproduction right of the United States Copyright Law. Following the release of Jukebox, critics have also questioned whether training AI models on copyrighted music material constitutes fair use. Similar concerns have been raised around training data used in AI systems that generate images, code, and text, which is often pulled from the web without the knowledge of the creators.
From the user perspective, Andy Baio of Waxy speculate that music generated by an AI system would be considered a derivative work, in which case only the original elements would be protected by copyright. Of course, it is not clear what could be considered “original” in that music; to use this music commercially is to enter uncharted waters. It’s a simpler matter if the generated music is used for purposes protected by fair use, such as parodies and commentary, but Baio expects courts to have to make judgments on a case-by-case basis.
It may not be long before there is some clarity on this. various demands fighting its way through the courts will likely have a relationship with the AI that generates music, including one related to the rights of artists whose work is used to train AI systems without their knowledge or consent. But time will tell.