Imagine a world where technology meets the rich cultural tapestry of Arabic music, where AI doesn’t just replicate but resonates with the emotions, dialects, and rhythms of the region. This is the heart of my latest project—an endeavor that merges the power of GPT-2 with the soul of Arabic song lyrics. By fine-tuning a model on the HABIBI dataset, I’ve crafted a tool capable of generating new, original verses that echo the poetic nuances and diverse dialects that define Arabic music. This isn’t just AI; it’s a bridge between tradition and innovation, between past and future melodies.
In this NLP project, we leveraged the HABIBI dataset as the cornerstone for our Arabic lyrics generator. The dataset is a rich repository of 30,072 Arabic songs, spanning a broad spectrum of genres and styles, performed by 1,755 different artists. It offers a diverse linguistic landscape, featuring lyrics in six distinct dialects: Meghribi, Gulf, Iraqi, Sudanese, Egyptian, and Levantine.
The HABIBI dataset captures the essence of Arabic music by incorporating variations in dialect, singer, composer, songwriter, nationality, and song title. This diversity ensures that the generated lyrics resonate with Arabic speakers across different regions and linguistic backgrounds.
Before delving into the results and analysis, it’s crucial to understand the dataset’s dialect distribution. The pie chart below illustrates the makeup of the dataset by dialect, highlighting the linguistic diversity and cultural nuances that our generator seeks to capture. This distribution provides valuable insights into the regional variations and subtleties that will influence the generated lyrics.
We began by preparing our dataset of Arabic song lyrics, sourced from various collections. To focus on lyric generation, we removed irrelevant metadata such as song titles, songwriters, composers, and singer nationalities, and cleaned the text by stripping out punctuation.
For tokenization, we employed the GPT-2 model's tokenizer, which uses the Byte-Pair Encoding (BPE) algorithm. This method efficiently breaks down text into subword units, a crucial feature for handling the complex morphology of the Arabic language. The tokenizer leverages a pre-defined vocabulary learned during GPT-2’s pre-training, ensuring compatibility and consistency with the model. This setup allows for accurate encoding of text into numerical sequences and seamless decoding of generated outputs, ultimately enhancing the quality and contextual relevance of the generated lyrics.
The GPT-2 model, once a cutting-edge innovation by OpenAI, now feels like legacy technology a year later. Despite this, GPT-2—short for Generative Pre-trained Transformer 2—was a game-changer in its time, excelling in tasks like text generation, translation, summarization, and question answering.
As a transformer-based model, GPT-2 utilizes a self-attention mechanism to grasp relationships between words, enabling it to produce coherent and contextually relevant text. A key strength of GPT-2 lies in its extensive pre-training on vast internet text, which equips it with a solid understanding of grammar, syntax, and linguistic patterns. This foundation allows GPT-2 to generate text that is both accurate and contextually meaningful.
The fine-tuning process was feasible using a free-tier GPU on Kaggle, making it an accessible option for this task
We implemented a custom SongLyrics class to format the lyrics for GPT-2 training. This class used the GPT2Tokenizer to tokenize lyrics into subword units, the format required by the model. We set a maximum sequence length to ensure efficient training and avoid memory issues.
For training, we used a train function that handled batch processing and gradient accumulation to stabilize the process. The AdamW optimizer, combined with weight decay, was employed to update model parameters and minimize training loss. We managed the learning rate with a linear schedule that increased during warm-up steps and decreased thereafter, helping prevent overfitting.
We monitored loss values to ensure the model was learning effectively, and trained for a set number of epochs, saving the model at each stage with the option to checkpoint progress. After training, we saved the fine-tuned model, which captured the linguistic and lyrical nuances of Arabic songs, allowing it to generate coherent and contextually relevant lyrics based on user prompts.
# Function for training the model def train( dataset, model, tokenizer, batch_size=16, epochs=20, lr=2e-5, max_seq_len=400, warmup_steps=200, gpt2_type="gpt2", output_dir=".", output_prefix="wreckgar", test_mode=False, save_model_on_epoch=False, ): # Set up training parameters acc_steps = 100 device = torch.device("cuda") model = model.cuda() model.train() optimizer = AdamW(model.parameters(), lr=lr) scheduler = get_linear_schedule_with_warmup( optimizer, num_warmup_steps=warmup_steps, num_training_steps=-1 ) train_dataloader = DataLoader(dataset, batch_size=1, shuffle=True) loss = 0 accumulating_batch_count = 0 input_tensor = None # Iterate over epochs for epoch in range(epochs): print(f"Training epoch {epoch}") print(loss) # Iterate over batches in the dataloader for idx, entry in tqdm(enumerate(train_dataloader)): # Pack the input tensors based on the maximum sequence length (input_tensor, carry_on, remainder) = pack_tensor(entry, input_tensor, 768) if carry_on and idx != len(train_dataloader) - 1: continue input_tensor = input_tensor.to(device) outputs = model(input_tensor, labels=input_tensor) loss = outputs[0] loss.backward() if (accumulating_batch_count % batch_size) == 0: optimizer.step() scheduler.step() optimizer.zero_grad() model.zero_grad() accumulating_batch_count += 1 input_tensor = None # Save the model at each epoch if specified if save_model_on_epoch: torch.save( model.state_dict(), os.path.join(output_dir, f"{output_prefix}-{epoch}.pt"), ) return model
more details can be found in the training-code.ipynb
To generate Arabic song lyrics, we used the fine-tuned GPT-2 model in combination with a custom generate function. This function took inputs such as the model, tokenizer, prompt, and settings like the number of entries and entry length. The model predicted each token iteratively, using previously generated tokens as context.
During generation, we balanced creativity and adherence to the input prompt using probability distributions. The top_p parameter controlled the diversity of the outputs, with higher values allowing for more varied lyrics and lower values focusing on high-probability tokens. The temperature parameter managed the randomness, where higher values led to more creative outputs and lower values resulted in more predictable lyrics. The process continued until the desired length or a stopping token was reached.
We then evaluated the generated lyrics by comparing them with true endings from the test set to assess their quality and coherence. The fine-tuned GPT-2 model successfully produced lyrics that maintained the lyrical style, semantic coherence, and thematic relevance of Arabic songs.
The model's output consisted of tokens, representing words or subwords. To convert these tokens into readable text, we used the tokenizer to map each token to its corresponding word in Arabic. This decoding step transformed the generated token sequences into human-readable lyrics, making them ready for evaluation and interpretation.
more details can be found in the generate-text.ipynb
We evaluated the performance of our models using 2 different metrics, including BLEU score and
human evaluation for 10 songs of each dialect that we excluded from the training dataset.
The GPT2 model, without fine-tuning, tends to generate English words as it is not specifically pre-trained on the Arabic language.
Conversely, the model trained on the entire corpus demonstrates the capability to distinguish between dialects and
generate lyrics consistent with the prompt’s dialect
Model | Full Corpus | Gulf | Egyptian | Levantine | Iraqi | Sudan | Meghribi |
---|---|---|---|---|---|---|---|
Avg Bleu Score | 0.6810 | 0.6817 | 0.693 | 0.687 | 0.668 | 0.693 | 0.698 |
It is evident that across the 10 randomly selected test songs for each dialect, the average Bleu scores obtained by the model trained on the whole corpus exhibit a close similarity.
We trained separate models for each dialect and compared their outputs to those of a model trained on the entire corpus.
In the table above, the sample results for the Egyptian dialect. we can see that for The first and the second row both models (the one trained on the whole corpus and the one trained on the Egyptian songs only) performed similarly which is unexpected since one has more diverse data than the other. For the third prompt we have almost accurate output for both models as they generated the same true end of the lyrics.
In the table above, the sample results for the Gulf dialect. In the first row the model trained on one dialect performed poorly even generating an English word while the model generated on the whole corpus performed better which is expected. For the remaining sample data, the full corpus model exhibited a slight improvement, generating sentences that are more legible sentences pompared to the other model.
In fthe table above, the results of both models are different and not similar to True end lyrics however both generated lyrics are lexically correct and they rhyme to some extent.
In the table above, the first two rows showcase similar results from both models, although the output from the full corpus model is more legible, as anticipated. However, the results in the last row are unexpected. The model trained solely on the Meghribi dialect outperformed the model trained on the entire corpus significantly, while the latter produced notably poor results, even generating English words.
Apologies to non-Arabic readers/viewers—poetry often relies on nuances that direct translations may not fully capture.
In this project, we explore the intersection of artificial intelligence and Arabic music through the generation of song lyrics using the GPT-2 model. By fine-tuning GPT-2 on a curated dataset of Arabic song lyrics, known as the HABIBI dataset, we aimed to produce lyrics that authentically reflect the style, emotion, and linguistic diversity of Arabic music. The generated lyrics not only maintain the essence of traditional Arabic songs but also adapt to various dialects, offering a unique blend of cultural heritage and modern technology. This work highlights the potential of AI to contribute creatively to the preservation and evolution of regional music traditions.
Code Repository
The project in this publication is implemented in this repository:
https://github.com/Mahmoud-Hesham99/Arabic-Lyrics-Generation