TLDR: I made a song using my own 3LAU generative AI music model, added some GrimesAI vocals, and it sounds really good. The song is out now! AND you’ll be able to access the same tools I used on Jan 12th via royal!
Check out 3LAU AI x Grimes AI - For You Below!
INTRO
Everyone’s been talking about AI music, but most people don’t listen to it regularly. Why? The answer is pretty straightforward in that… it just isn’t really good yet. While we’re at it, let’s lightly define “really good” as a state of quality where listeners actively choose to repeat a song multiple times out of subjective positive preference.
As the music-machine-learning hype train became all the rage in 2023, I found myself obsessing over the production quality problem as I explored potential solutions with little personal experience in AI / ML. As we all know too well, sometimes you just gotta dive down a bottomless rabbit hole and pray that people are interested in what you discover when you emerge.
Most interactions in sound production are mathematical. Digital audio, at its core, is just a manifestation of binary represented as 0’s & 1’s. An AI music model could / should completely replicate every decision a musician makes when completing a song, it just needs the appropriate data & computing power to create quality outputs. So what’s holding the tech back from high production quality today? A few variables are worth mentioning.
Producers who spend months fine-tuning sonic relationships are pretty protective of their project files, AI music copyright questions are seemingly endless, and rendering full songs at scale can be quite expensive from a GPU-cost standpoint. Still, in my mind, there had to be a way to showcase the power of generative music, that could also sound “really good” - soon.
What could a human do to close the gap between a pure AI output and the intricate detailed choices a musician makes?
In the short term at least, we have to think about sound vs pixel clusters in a different way when prompting a language model to create content.
Having listened to a bunch of “AI beats” and “AI vocal” model copycats of The Weeknd & Kanye, I began to form a thesis on how to create a fully generative song with some manual creative post-production to improve quality. I went into the process with managed expectations, never anticipating my own creative desire to release the music I ended up making with AI tools. I definitely proved myself wrong.
My unsurprising hypothesis was to separate the two core parts of a song, the beat and the vocal, independently using ML models to create a generative output for each component. To be honest, this is exactly how music producers think about marrying vocals to beats through signal processing and editing. It is by no means a new or revolutionary approach, but it’s likely been overlooked in the past when thinking about composing an “All AI” song.
Without further ado, let’s walk through how I got this output to sound “acceptable” by my own standards, which made me comfortable enough to release it today as my new song called For You.
THE BEAT
The first step: How could I leverage a music model to generate new 3LAU-like beats based on a variety of my previous work? We’ll refer to this as the 3LAU Template moving forward, since that’s kind of the format of the generative model we built! Some initial context before we dive in for those new to music production, there are mainly two sonic factors to every song:
MIDI - A technical communication protocol that defines what notes happen, and when in a song. It also triggers sounds at different sequences & different lengths. (It also connects hardware devices to software and much more, but we’ll leave that deeper dive out for now).
Audio - Pure recordings of sounds from the real world or individual one-shot sounds that exist independently (samples). These can exist at the full length of a song or can exist as samples (or can be triggered by MIDI).
The above two images are screenshots of the project I worked on to create our example case song “For You” that just came out. You can access the project file as well as all the MIDI & Stems of the AI output and final version of the song HERE.
To build the template, I partnered with Soundful; they construct models as hybrids between probabilistic generative MIDI and real samples (audio one-shots) to compile a generative output. Much of the AI music you’ve probably heard in the past is pure-form, in the sense that the ML model looks at existing data, and rebuilds the sound from the ground up using binary. That tends to create artifacts and less natural sounding qualities to the sounds that change our perception of high quality.
I tend to keep my previous project files organized, so open-sourcing my work was pretty straightforward. Freezing each individual track (committing / printing a sound & all of the plugins used to edit that sound) across 10 3LAU songs was a great place to start. That means the engineers who built our template could work with both the real audio I’ve used across all my songs AND the MIDI I used to write melodies and chord voicings in the past. Finally, open-sourcing the way I use plugins to mix, master, and process each individual sound drastically improves the quality of a music template, as new generations could mimic the same sonic relationships between sounds in a contained project. This includes many variables including tracking note velocity, plugin automation and even side-chain compression curve styling. For those curious, some of the songs we used to train the generative template include Tokyo, Is It Love, How You Love Me, Touch & Easy.
So how does it work? The 3LAU template is unique in that the generative MIDI component triggers samples from a giant library of existing audio that I’ve created over the past 12 years of my production career! This influences the output quality in a huge way, and of course, sounds a lot like my previous work. It also means that when the 3LAU template is triggered to make something new, the user receives a fully rendered export of the song as well as its components (each individual instrument MIDI track & audio stem track separated by type, drums, synths, bass lines etc.)
Once the 3LAU template was dialed in, I was thrilled to test it out! After generating 10 or so examples I picked my favorite one to turn into a real song. It was a wild & pleasantly surprising experience; in listening to all of the options, they all could have been ideas I made from scratch.
HERE is the one I picked to turn into a real song and HERE is a folder of ideas I didn’t pick to finish, mainly for subjective personal preference reasons.
Now that I had the full-form output of my favorite generation, I was glad to have access to each independent track / element that made up the song so I could edit it at will.
In an attempt to maintain the maximum integrity of the AI output, I wanted to keep the arrangement of the song (breakdown length, drop timing etc.) exactly the same. Most of the editing only involved slight equalization and compression, but there was a single sound replacement - the bass - where I used a software synthesizer to add some punch. I also adjusted the pattern to be more rhythmic than the original output’s straight 1/16th notes. That said, it was more creative deletion than adding anything too new.
As I completed the mix / master, I was startled by the fact that the 3LAU Template effectively WROTE and composed the song, I barely did anything at all. The next piece of the puzzle was adding a little flare with a vocal. Ironically enough, the vocal chops from the AI output over the drop were not derived from the Grimes AI vocal below. Since I use vocal chops in a lot of my own records, the template actually generated those tastefully from scratch. The next section dives a little bit deeper in how I was able to match the Grimes AI to the existing awesome-sounding beat we chose from the 3LAU template.
THE VOCAL
In June of 2023, Grimes took a huge step for all vocalists, being the first major artist in the world to make her voice open-source. Grimes was an obvious choice for finding the right vocal to match the AI beat, and her vocal AI product - elf.tech - was incredibly easy to use.
Vocal AI products can be a bit tricky to understand if you haven’t used one before. There are two primary ways to get quality vocal character out of any vocal model: text-to-speech or by using reference audio as the input. While text-to-speech is self-explanatory (think- Siri reading a text to you), reference audio means using another voice to trigger the AI to speak or sing words in the same way with the same rhythm.
For this song, reference audio was a better-quality approach because it enabled me to control the speed & the rhythm of the GrimesAI output, which was of course essential in having it match the beat.
I found THIS cool vocal sample / lyric on Splice that I used to trigger the GrimesAI model. The vocal model is quite spectacular, so incredible that I could even change the source of reference audio between two different voices. In attempting to fix a weird pitch glitch, I personally re-sang a few of the vocal phrases myself, in the exact way I wanted GrimesAI to sing them - it all matched perfectly.
The cool part here was discovering the modularity of using vocal AI tools. You could pull reference audio from multiple different voices of different character, but the GrimesAI still sounds like Grimes!
It really felt like magic.
The song evolved into an awesome example of what my friend & Grimes’s manager Daouda likes to call Artistic Intelligence - the use of AI models to “create with yourself.”
THE RESULT & WHAT’S NEXT!
We made the song with the 3LAU template beat creation model, but now we wanted the world to be able to use it too. That’s why the team at Royal built onchain infrastructure to enable the licensing of the model, and publishing tools for anyone who wants to use it.
Blockchains are particularly good at establishing provenance and ensuring immutability, especially when thinking about verifying the origination of a creative model, linking its manifestation to the artist who trains it.
By fusing this model with tokenized rights, it becomes indisputable who creates and owns its outputs. The user can take advantage of all composability features of onchain assets, and easily show off their collaboration with an artist across web3.
Of course, tokenization also gives the model creator full control of its use, enabling experimentation with licensing price discovery through scarcity mechanics. This is why, in conjunction with the song launch, Royal built a new prototype product called Sonic, which we can let speak for itself ;)
Get ready for some crypto, music, AI magic this Friday, January 12th!