An IndoBERT-based framework for emotion classification in Indonesian song lyrics

Agustar Alfonso; Fitri  Insani; Okfalisa Okfalisa; Muhammad Fikry; Fitra Kurnia; Sri Wahyuni

doi:10.59190/stc.v6i3.372

Authors

Agustar Alfonso Department of Informatics Engineering, UIN Sultan Syarif Kasim Riau, Pekanbaru 28293, Indonesia
Fitri Insani Department of Informatics Engineering, UIN Sultan Syarif Kasim Riau, Pekanbaru 28293, Indonesia
Okfalisa Okfalisa Department of Informatics Engineering, UIN Sultan Syarif Kasim Riau, Pekanbaru 28293, Indonesia
Muhammad Fikry Department of Informatics Engineering, UIN Sultan Syarif Kasim Riau, Pekanbaru 28293, Indonesia
Fitra Kurnia Department of Informatics Engineering, UIN Sultan Syarif Kasim Riau, Pekanbaru 28293, Indonesia
Sri Wahyuni Department of Psychology, UIN Sultan Syarif Kasim Riau, Pekanbaru 28293, Indonesia

DOI:

https://doi.org/10.59190/stc.v6i3.372

Keywords:

Emotion Classification, Fine-Tuning, IndoBERT, Song Lyrics, Transformer Model

Abstract

Emotion classification in song lyrics represented a significant research area within natural language processing, yet studies targeting Indonesian-language lyrics remained scarce due to the limited availability of labeled datasets and the absence of domain-specific models. This study developed and evaluated an emotion classification model for Indonesian song lyrics using fine-tuned IndoBERT-base-p2, a transformer-based language model pre-trained on a large Indonesian corpus. A dataset of 1,025 labeled lyric entries was compiled from Kaggle, Genius, and KapanLagi, covering four emotion categories: joy, sadness, fear, and anger. Preprocessing encompassed duplicate removal, case folding, structural marker removal, and non-alphabetic character cleaning. Nine fine-tuning experiments were conducted by systematically varying learning rate and dropout rate, with early stopping applied based on validation loss. The optimal configuration employed a learning rate of 3 × 10^-5 and a dropout rate of 0.1, achieving 75.73% accuracy and 75.85% macro-averaged F1-score on the held-out test set. Joy and anger were classified most reliably, attaining F1-scores of 82.76% and 76.47% respectively, while sadness presented the greatest challenge, exhibiting the lowest precision of 64.10% alongside a recall of 80.65%, indicating a systematic tendency of the model to over-predict this class. These findings demonstrated that IndoBERT-base-p2, when fine-tuned with appropriate hyperparameter configuration, served as an effective approach for domain-specific emotion classification in Indonesian song lyrics.