Creating a Rap-Bot

Text Generation with Deep Learning (RNN)

6 min readMar 5, 2021

Introduction

I have always heard that the best ideas just come to you without effort, and if you are lucky, at the exact time that you need them. I am someone with a lot of ideas and often love to explore them in the real world if possible. In this case, I wondered: “Could I train a computer to rap like a human?”. Fortunately I was in the position to actually try this out!

I was learning about Deep Learning, a method of Machine Learning using artificial neural networks, when I came across a video covering text generation with TensorFlow (link). The presenter, Laurence Moroney, demonstrates this concept using traditional Irish songs; I was curious whether or not this would be possible using a more modern genre of song: Rap.

Having dealt with the TensorFlow/Keras API when performing my first Twitter sentiment analysis I knew how simple this could be to implement given the proper setup. Always interested in testing the limits, I wanted to do more than just generate rap lyrics, I wanted it to sound real. It needed to sound like someone. So I decided to limit the training data set to lyrics pulled from one artist’s catalogue, hoping to “clone” their style digitally and use it to randomly generate lyrics.

The Toolkit

My end goal was to be able to deploy this model out so that it could be used by anyone with internet access, so I tried to keep my setup as simple as possible. I coded everything using Python and these packages:

Data Collection/Exploration:

Requests==2.23.0
BeautifulSoup4==4.7.1
Regex==2020.6.8
Pickle==0.7.4
Pandas==0.23.4
Seaborn==0.9.0
Matplotlib==3.0.2
NLTK==3.4.5

Text Generation:

NumPy==1.16.5
TensorFlow==2.2.0

There were additional custom functions that incorporated these packages that helped readability and/or performed a specific task (see “functions.py” in GitHub repository).

Gathering the Words

As with many projects, I first needed to flex my data wrangling skills before I could begin doing any of the fun stuff. This took some time. Through some trial and error and some BeautifulSoup work, I was able to amass a collection of 180 songs by a single rapper to train my model on.

If you are curious as to how I was able to so something like this in more detail, check out my blog post on the topic.

Once I had all of the songs, I spent some time understanding what to expect from my model. This is rule #1 for me when it comes to any dataset, especially one of my own making. Once you adjusted for stop words (using the standard English list), this rapper’s top three most frequently used words are interesting:

Like — 11.65%
B**** — 7.15%
Got — 6.82%

This person likes to use similes in their lyrics so I am not so surprised at the first entry but the second seems higher than I expected; a whole 7% of their lyrics cannot be played over the radio! However, if I expand the stop word list to include more words (shown in this thread), I end up with a top three consisting of three curse words. With usage rates of 7.15%, 5.04%, and 4.47% respectively, it is evident this rapper uses colorful language to get their message across.

Getting Creative

Taking what I learned from the EDA, I began following the steps as outlined in the inspirational video for this project, tweaking whatever I needed to fit my goals/data.

In summary:

Tokenized each song, maintaining line breaks (where each “bar” ends)
Processed all of the songs, creating a numerical representation for on a character level (including spaces)

Created lookup tables for integers and their corresponding character values for easy conversion

Converted the lyrics into integer sequences using the lookup tables

Each character and it’s corresponding integer representation

Created target and input batches using the integer sequences (These will be used later to train the model)

The Structure

Whenever using a neural network, the network’s architecture can be one of the most influential factors on how or if you complete your task. In this case, I had the suggested build from TensorFlow available, so I used it as a baseline. The baseline really pushed the limits of my CPU, so I stepped some of the parameters down (with respect to complexity). After some tweaking I ended up with:

One embedding layer [256 dimensions]
One GRU (Gated Recurrent Unit) layer [1024 units]
One output layer [42 units — from the number of unique characters]

Side note: A GRU is a type of RNN (Recurrent Neural Network) layer that is perfect for text recognition tasks because it can “remember” data in a previous timestep when looking at the current, or next, step (more info on RNNs)

Results!

After some very long training time, I got some results I was satisfied with. I was able to achieve the current level of performance in only 35 epochs!

The model was evaluated using a loss function called “Sparse Categorical Cross-Entropy”, which essentially would look at each prediction’s probability and compare it to that of the actual lyrics on a character-by-character basis. This can get pretty deep, so if you’re interested I recommend checking out this post by Jason Brownlee.

Overall, the model was able to perform the task I needed it to — generate rap lyrics — without being too overfit to the training data. My main concern, as I came to find out, was that certain repeated phrases began to loop during text generation. This required a close monitoring of the training progress and routine checks of performance.

Here are some examples:

Pre-Training

Input: “or fake b”
Predictions: “1jzptc 7to”
Final output: “or fake b1jzptc 7to”

Post-Training

Input: “im like”
Predictions: “a drum….”
Final output: “im like a drum/ real n#### b###### know my back round/ bring me oh oh hundreds hundreds the sound/ world around”

This was not the end however, I wanted to be able to share this creation with all who were interested. Luckily, I had heard about this service called Streamlit that will allow folks like me to host their apps on the internet using their API/Python package. It was super simple to get set up and use, I hope to share my experience in a future post.

Rapping Up

The best part about this project was using a new technology that I learned about to do something cool and interesting, no strings attached. I was never too concerned with creating the perfect copy, only being able to create anything at all.

I would wholeheartedly recommend anyone interested in deep learning or RNNs to look into generating text. Taking this route allowed me to learn more about how to apply NLP techniques in Python while broadening my deep learning experience with TensorFlow. I decided to have fun with it, but I am sure there are plenty of other business applications for this technology. In the future, I hope to explore more work within this realm, NLP and potentially expand upon this text generation project as well.

Check out the deployed version of the Rap Generator
GitHub Repository