An IndoBERT-based framework for emotion classification in Indonesian song lyrics

Authors

  • Agustar Alfonso Department of Informatics Engineering, UIN Sultan Syarif Kasim, Pekanbaru 28293, Indonesia
  • Fitri Insani Department of Informatics Engineering, UIN Sultan Syarif Kasim, Pekanbaru 28293, Indonesia
  • Okfalisa Okfalisa Department of Informatics Engineering, UIN Sultan Syarif Kasim, Pekanbaru 28293, Indonesia
  • Muhammad Fikry Department of Informatics Engineering, UIN Sultan Syarif Kasim, Pekanbaru 28293, Indonesia
  • Fitra Kurnia Department of Informatics Engineering, UIN Sultan Syarif Kasim, Pekanbaru 28293, Indonesia
  • Sri Wahyuni Department of Psychology, UIN Sultan Syarif Kasim, Pekanbaru 28293, Indonesia

DOI:

https://doi.org/10.59190/stc.v6i3.372

Keywords:

Emotion Classification, Fine-Tuning, IndoBERT, Song Lyrics, Transformer Model

Abstract

Emotion classification in song lyrics represented a significant research area within natural language processing, yet studies targeting Indonesian-language lyrics remained scarce due to the limited availability of labeled datasets and the absence of domain-specific models. This study developed and evaluated an emotion classification model for Indonesian song lyrics using fine-tuned IndoBERT-base-p2, a transformer-based language model pre-trained on a large Indonesian corpus. A dataset of 1,025 labeled lyric entries was compiled from Kaggle, Genius, and KapanLagi, covering four emotion categories: joy, sadness, fear, and anger. Preprocessing encompassed duplicate removal, case folding, structural marker removal, and non-alphabetic character cleaning. Nine fine-tuning experiments were conducted by systematically varying learning rate and dropout rate, with early stopping applied based on validation loss. The optimal configuration employed a learning rate of 3 × 10-5 and a dropout rate of 0.1, achieving 75.73% accuracy and 75.85% macro-averaged F1-score on the held-out test set. Joy and anger were classified most reliably, attaining F1-scores of 82.76% and 76.47% respectively, while sadness presented the greatest challenge, exhibiting the lowest precision of 64.10% alongside a recall of 80.65%, indicating a systematic tendency of the model to over-predict this class. These findings demonstrated that IndoBERT-base-p2, when fine-tuned with appropriate hyperparameter configuration, served as an effective approach for domain-specific emotion classification in Indonesian song lyrics.

Downloads

Published

2026-06-10

How to Cite

Alfonso, A., Insani, F., Okfalisa, O., Fikry, M., Kurnia, F., & Wahyuni, S. (2026). An IndoBERT-based framework for emotion classification in Indonesian song lyrics. Science, Technology, and Communication Journal, 6(3), 219-228. https://doi.org/10.59190/stc.v6i3.372

Most read articles by the same author(s)