The AI Song Contest

A code – be it a chord, melody or rhythm in someone’s head or a disk – can output into puffs of air to create this exquisite, yet ultimately unquantifiable substance we know as music. So how can a machine help us write music for humans?

The gorgeous accidents of the musician’s fingers slipping across their instrument, falling into a new collection of notes – or that warm feeling when a voice breaks between notes of a melody that make the whole tune sweeter – this is the intimate sound of a flawed and beautiful musician, their power, their individuality, their magic – the song inside the tune as Christina Aguilera once said.  

So how can you go from that mystery of music to asking a machine to write a hit song? Well, my team ‘Smorgasborg’, one of 38 entries to this year’s AI Song Contest decided to explore this question.

Voting closes on July 1st – listen to as many entries as you like – and vote up the ones you enjoy the most!

This years’ participants are a fantastic collection of artists, technologists and machine learning enthusiasts, all of whom approached the brief differently – make music – using machines! Now, as a musician learning the elements of coding, I approached the challenge from a musical perspective – choosing to discover “How can we use accessible AI tools to augment the human creative process rather than replace it?”

I’d taught myself AI composing with Style Transfer during my cancer treatment and subsequent shielding here in the UK. It’s surprising how much time there is for learning when utterly stationary, or unable to leave the house, or indeed when it’s impossible to sleep through the night… so why not stay up and run code every few minutes?

My initial results were interesting but not what I would call ‘easy listening’. This time would be different.

Like many musicians, I miss jamming with real people in real time, that physiological response to sound that makes us stand with others in front of massive speakers ‘in flow state’ – that feeling of being somewhere or nowhere – ‘in the zone’.

AI’s potential goes much further than just audio generating audio – which is what I’d been doing – and it just didn’t feel enough like a collaboration. After all, we could have asked the machine to produce random sounds and then selected the best. But instead, we split the song- some elements created by AI, some by the musician- and NO autotune!

Here’s a flowchart showing the relationship between human- and machine-generated content.

My team discussed many issues – even at the level of deciding what popular music is – for example, dominant music theory is based on Western, European tradition. This was partly why we elected to start with AI lyric generation rather than music, as it kept orchestral and melodic options open. We felt AI music would have been more likely to generate western diatonic sounds, as the data sets are currently largely lacking in music outside these descriptors. We wanted to explore some influences outside the popular western sounds.

Here are a few seeding prompts in bold below: For a while I was tempted to go with really simple songs. Then, during a team call, I read through a few that were strangely poetic”.

Example Prompts | Completions from GPT-3 (safe outputs) follow:

Do people want a catchy tune, a riff that can fly them to the moon? | Or would they rather just see me in my underpants?…” 

“dance dance I wanna | see you move dance dance dance! (Oh!) It’s time to make it right (Oh!) We’re gonna make it…”

write a love song about 2 machines | falling in luv. 1 hour later. BAM BAM BAM BAM BAM BAM “i just wan 2 say, that’s one of the best songs i’ve ever heard” “I KNOW RIGHT” “i mean, i would probably pay like $1.99 just to get that song on my phone.” “do it, do it now, i’ll wait” BAM BAM BAM BAM BAM BAM “i did it” “KEEP THE CHANGE…” 

After further experimentation, the below completion caught our attention.  

feel the bass dance all night headphone on smartphone light walking like a zombie in the dark pulsating body nightlife make you move vibrating body sending shiver to the core….”

These words… sounded most like a story, they had evocative phrases like ‘walking like a zombie’” We fed this modified prompt back into the engine and generated our lyrics which remain unchanged  throughout the songwriting process.  

feel the bass dance all night headphones on smartphone light walking like a zombie in the dark | ‘til the streetlight burns the sunrise behind your back you were here now you’re gone There’s a girl I know who knows what she wants.

I was inspired by the words and created a first draft of the melody, but was getting stuck on ‘Shine like a diamond dust in the universe’. We wanted to use lyrics verbatim to stay faithful to the AI, but were stumped on how to parse this particular musical phrase.So we used OpenAI’s Jukebox trained on Adele to suggest various new melodic lines. 

At first I used a model to output 30 seconds of music at a time – but my first attempts were frustrating – it didn’t create tunes that made theoretical sense! After more false starts, I realised co-composing suited me more, given my mainly musical background. Supervising every 4-6 secs added my own musical preferences to the generative result. 

After 21 attempts (and more crashes!), attempt 22 inspired me to re-scan the lyric lines –

|| Shine like a diamond  ||  Dust in the universe became

|| Shine || Like a diamond dust  ||  In the universe.

Yes! I gleefully thanked the program out loud even though it was 01:30AM, and sang a guide melody and piano accompaniment into Logic Pro X. I felt no need to upsample as I wasn’t planning to output audio and just needed to hear the melodic lines.

Google’s NSynth -one of the settings used with Ableton | Imaginary Soundscapes – with the image used to generate fireworks in the chorus

The bass, piano and pad are all generated via the NSYNTH sound. I was inspired by team mate Leila saying the song was set “In Space” and chose sounds based on this thought – resulting in ethereal and floating pads, with sharp shards of passing comet dust! Continuing the theme, we also used AI-generated audio from Imaginary-soundscape, an online engine designed to add suggested soundscapes to (normally earth) landscapes. We used an image of space from Unsplash and the AI returned audio – fireworks!  You can hear these alongside the chorus.

If you’d like to help us become Top of the Bots – please vote here – no need to register! Team Smorgasborg is excited to be part of the AI Song Contest!

A selection of AI tools used in the creative process: we also used Deep Music Visualizer and Tokkingheads for the music video

GPT-3 – Lyric generation from word prompts and questions  https://beta.openai.com 

Jukebox (OpenAI) – neural networks style transfer for solving musical problems https://jukebox.openai.com 

NSYNTH – machine learning based synthesiser for sound generation  https://magenta.tensorflow.org/nsynth 

Imaginary Soundscape –  AI generated soundscapes from images   https://www.imaginarysoundscape.net

DreamscopeApp – deep dream image generator https://dreamscopeapp.com/deep-dream-generator 

Music Video for Team Smorgasborg, LJ, Dav and Leila.

“I knew the song was finished when Logic gave me the “System Overload” message.”

-very late at night

3 Surprising AI Music Mashups that will make you question your musical tastes

24-hour streaming AI-generated heavy metal on YouTube completely fascinated me – created by the eccentric Dadabots, half of whom I’ve regularly collaborated with on various strange musical projects. Their outputs inspired me to start my own journey of intersecting music with machine learning.

I’ve been composing since I was a kid on whatever platform I could find. Classically trained with a music degree while hungry for as much new music as possible makes for a strange hybrid, a musician and performer trying to understand a technologist’s world.

Amid much struggling and general frustration and many false starts, the stubbornness and late night wrangling paid off. I had my first track and plucked up the courage to share some of my experiments online.

So, here’s one of my first flirtations with Music and Machine Learning on Instagram – the Beatles singing ‘Call Me Maybe’ – because for some reason I thought it needed to exist. And, buoyed by my coding success, I learned how to generate some eye-bending video based on pitch and tempo too.

Each track takes quite a few hours to generate – even 45 seconds or so is a whole evening of attention. The way I’ve been doing it is heavily supervising the code, I need to intervene every few seconds to suggest a new direction for the algorithm in order for it to fit the direction I want it to go in. A lot of the decisions I’m making are not technical – they’re based on my musical knowledge. Then I listen repeatedly to the slowly lengthening audio to see if there’s a recognisable tune being created. Is it sounding like something a human can sing? Plus the ‘upsampling’ process, where some of the noise is removed, can take many hours. A lot of the time I’ll crash out of the virtual machine I’m using because I’m on the free tier. Sometimes I’ll lose everything.

Sounds frustrating, and it’s even more annoying in practice. Yet I find the ultimately infuriating nature of co-composing this way rather addictive. And, wow, when it actually does work, the results are incredibly rewarding.

So, ‘my’ new song made up of thousands of tiny bites of Beatles was compiled. And it is undeniably the Beatles singing ‘Call Me Maybe’ – so much so that a few of my friends thought this could easily be a demo tape or an unheard song if not for the lyrics.

My work received admiration from those familiar with AI music generation – they could tell how much effort was required to create it. And as well as praise, this short tune also generated unsettling feelings for others – which weirdly excited me – to have made something so conversation-worthy – especially in a field as wide as AI and Machine Learning felt like I was onto something, that my musical approach could add value in its own way.

Here’s another one – Queen singing ‘Let It Go’.

So why do I think this might make you question your musical tastes? Well, many of us are quite specific about the music we like. But if a fifty-year-old Beatles recording can be rehashed for a 21st Century Audience, would this track encourage a non-Beatles listener to explore more of this kind of music? Or would a devout 1960s music fan be persuaded to venture outside their comfort decade into the world of sugary pop music? I think it does.

Here’s U2 singing ‘Bat Out Of Hell’.

I’m surprised how much the original artist maintains their presence in each of these examples. And I’m somewhat tickled that the processing and supervision of each track makes this a very labour-intensive activity – not unlike standard music production.

As a new composing method, I am in awe of the sheer amount of work that must have gone into creating this program, and the brilliant minds behind it who conceived and created such a formidable tool for co-creation.

It even seems possible to train the AI on any kind of music as long as the artist has made enough material to be sampled adequately. Which is great news for those of us keen to create cross-cultural artworks – even though there are thousands of artists in the current Jukebox library, the content does appear to skew toward English-speaking music – a useful reminder that bias is built in to every system with humans at one end of it. So one of my next quests will be to see whether I can create my own training set (which might prove taxing on the free tier).

Finally, from a musical perspective, human composers still have quite a few advantages over machines, though generating music with AI is like a whole band writing all its parts at once, which can be very satisfying, if erratic. Sometimes the algorithm is temperamental – and doesn’t work at all. Other times, sublimely beautiful chords and ad-libs come out. No one can know whether the next track is a hit or a miss.

Even controlling the output is gloriously elusive: for example I can’t force a tune to go up or down at any point (though I can choose one of the alternatives that fits roughly where I’d like the tune to go). I don’t have much choice over the rate or meter of the lyrics – though there is some leeway when paginating them in the code. And changing the rate of intervention also affects what’s being generated – in short, the illusion of pulling order from chaos, a pleasing reflection of what composing music means to me.

In quite a few instances the AI has surprised me musically, and that is intriguing enough on its own for me to want to continue creating and co-composing with a machine. With so many possibilities in this field right now, I’m looking forward to exploring more.