Enhancing Indonesian hadith classification through multi-word embedding and support vector machine
DOI:
https://doi.org/10.59190/stc.v6i3.384Keywords:
FastText, Hadith Classification, Multi-Word Embedding, Support Vector Machine , Word2VecAbstract
Hadith classification plays an important role in supporting the organization and retrieval of Islamic knowledge in digital environments. However, the increasing volume of digital hadith collections presents challenges for manual classification, making automated approaches increasingly necessary. This study proposes a hadith text classification framework based on support vector machine (SVM) and a Multi-Word Embedding approach. The dataset used in this study was obtained from the Kaggle hadith dataset repository and consists of 34,441 hadith records. The textual data were preprocessed through case folding, noise removal, stopword removal, and stemming before feature extraction. Three embedding strategies were evaluated, namely Word2Vec, FastText, and the proposed multi-word embedding, which combines Word2Vec and FastText representations through vector concatenation. The generated feature vectors were subsequently classified using SVM and evaluated using accuracy, precision, recall, and F1-score. Experimental results show that the proposed multi-word embedding approach achieved the best performance, obtaining an accuracy of 75.58%, precision of 75.68%, recall of 75.58%, and F1-score of 75.46%. These results outperform Word2Vec + SVM and FastText + SVM, demonstrating that the integration of contextual semantic and subword-level information produces richer feature representations and improves classification effectiveness. The findings indicate that multi-word embedding is a promising approach for automated hadith text classification and can contribute to the development of intelligent Islamic information systems.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Mila Hastati, Junadhi Junadhi, Susi Erlinda, Agustin Agustin

This work is licensed under a Creative Commons Attribution 4.0 International License.









