Nutri-score classification of snack products using word embedding and random forest

Authors

  • Onky Wanda Darmawan Department of Informatics Engineering, Universitas Sains dan Teknologi Indonesia, Pekanbaru 28299, Indonesia
  • Junadhi Junadhi Department of Informatics Engineering, Universitas Sains dan Teknologi Indonesia, Pekanbaru 28299, Indonesia
  • Lusiana Efrizoni Department of Informatics Engineering, Universitas Sains dan Teknologi Indonesia, Pekanbaru 28299, Indonesia
  • Nurjayadi Nurjayadi Department of Informatics Engineering, Universitas Sains dan Teknologi Indonesia, Pekanbaru 28299, Indonesia

DOI:

https://doi.org/10.59190/stc.v6i3.393

Keywords:

Healthy Food, Word2Vec, GloVe, FastText, Random Forest

Abstract

The increasing consumption of packaged snack products has raised concerns regarding their nutritional quality and potential health impacts. Although nutritional information is commonly provided on food packaging, many consumers experience difficulties in interpreting ingredient descriptions and nutritional labels, making it challenging to identify whether a product is healthy or unhealthy. Therefore, an automated classification system is needed to assist consumers in understanding nutritional information more effectively. This study proposes a text-based classification framework for categorizing snack products into healthy and unhealthy classes using Natural Language Processing (NLP), word embedding techniques, and the Random Forest algorithm. The dataset was obtained from the Open Food Facts database and filtered to include snack products only. After preprocessing and class balancing, a total of 465 samples were used for model development and evaluation. The preprocessing stage consisted of case folding, tokenization, stopword removal, and stemming. Three word embedding techniques, namely Word2Vec, GloVe, and FastText, were employed to transform textual ingredient descriptions into numerical feature representations. Subsequently, Random Forest was utilized as the classification algorithm, and its performance was evaluated using Accuracy, Balanced Accuracy, Precision, Recall, F1-score, and Macro F1-score. The experimental results show that GloVe achieved the best performance among the evaluated embedding methods, obtaining an accuracy of 86.02%, balanced accuracy of 84.72%, precision of 85.98%, recall of 86.02%, F1-score of 85.91%, and macro F1-score of 85.19%. The findings indicate that GloVe provides a more effective semantic representation of food-related textual information compared to Word2Vec and FastText. Overall, the proposed framework demonstrates the potential of NLP-based approaches for automated nutritional assessment and healthy food classification.

Downloads

Published

2026-06-29

How to Cite

Darmawan, O. W., Junadhi, J., Efrizoni, L., & Nurjayadi, N. (2026). Nutri-score classification of snack products using word embedding and random forest. Science, Technology, and Communication Journal, 6(3), 373-384. https://doi.org/10.59190/stc.v6i3.393