Penerapan Algoritma Support Vector Machine (SVM) dengan TF-IDF N-Gram untuk Text Classification

Nur Arifin(1*), Ultach Enri(2), Nina Sulistiyowati(3)

(1) Universitas Singaperbangsa Karawang
(2) Universitas Singaperbangsa Karawang
(3) Universitas Singaperbangsa Karawang
(*) Corresponding Author

Abstract


Syntax Journal of Informatics is an information system that contains a collection of scientific articles managed by the Informatics Study Program of Singaperbangsa Karawang University. Currently, Syntax Journal of Informatics does not have a feature for categorizing scientific articles based on their focus and scope. The research is conducted to classify scientific articles into categories according to focus and scope contained on Syntax Journal of Informatics’ page automatically by utilizing the text mining process. Text mining is a process that aims to get important information from the text. The method used in the research is Knowledge Discovery in Database (KDD) with stages of data selection, preprocessing, transformation, modeling and evaluation. This study will compare the classifications based on the title of the article. The algorithm used is the Support Vector Machine (SVM) using four SVM kernels, including the linear kernel, polynomial kernel, sigmoid kernel and RBF kernel. Data are divided into four scenarios by using traintestsplit, namely 60:40, 70:30, 80:30 and 90:10. The results of the study after testing the model are measured by of Accuracy, Precision, Recall and F-measure. The best results are accuracy of 70%, precision of 75%, recall of 69% and f-measure of 71% in the 90:10 comparison scenario and linear kernel.


Keywords


Support Vector Machine; Text Classification; TF-IDF; N-Gram;

Full Text:

PDF

References


W. Anggraini, M. Utami, J. Berlianty, and E. Sellya, “Klasifikasi Sentimen Masyarakat Terhadap Kebijakan Kartu Prakerja di Indonesia,” Faktor Exacta, vol. 13, no. 4, pp. 255–261, 2021, doi: 10.30998/faktorexacta.v13i4.7964.

A. Deolika, K. Kusrini, and E. T. Luthfi, “Analisis Pembobotan Kata Pada Klasifikasi Text Mining,” Jurnal Teknologi Informasi, vol. 3, no. 2, p. 179, 2019, doi: 10.36294/jurti.v3i2.1077.

V. Gupta and G. S. Lehal, “A survey of text mining techniques and applications,” Journal of Emerging Technologies in Web Intelligence, vol. 1, no. 1, pp. 60–76, 2009, doi: 10.4304/jetwi.1.1.60-76.

J. Han, M. Kamber, and J. Pei, Data Mining Concepts and Techniques. 2012. doi: 10.1016/b978-0-12-381479-1.00001-0.

B. Herwijayanti, D. E. Ratnawati, and L. Muflikhah, “Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity,” Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 1, pp. 306–312, 2018.

A. Liani, U. Enri, and Y. Umaidah, “Analisis Perbandingan Kernel Algoritma Support Vector Machine dalam Mengklasifikasikan Skripsi Teknik Informatika berdasarkan Abstrak,” JOINS (Journal of Information System), vol. 5, no. 2, pp. 240–249, 2020, doi: 10.33633/joins.v5i2.3715.

S. Mardianti, M. Zidny, and I. Hidayatulloh, “Ekstraksi tf-Idf n-gram dari komentar pelanggan produk smartphone pada website e-commerce,” no. April, pp. 79–84, 2018.

A. A. Prasanti, M. A. Fauzi, and M. T. Furqon, “Klasifikasi Teks Pengaduan Pada Sambat Online Menggunakan Metode N- Gram dan Neighbor Weighted K-Nearest Neighbor ( NW-KNN ),” vol. 2, no. 2, pp. 594–601, 2018.

I. Riadi, R. Umar, and F. D. Aini, “Analisis Perbandingan Detection Traffic Anomaly Dengan Metode Naive Bayes Dan Support Vector Machine (Svm),” ILKOM Jurnal Ilmiah, vol. 11, no. 1, pp. 17–24, 2019, doi: 10.33096/ilkom.v11i1.361.17-24.

O. Rahman, G. Abdillah, and A. Komarudin, “Klasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 17–23, 2021, doi: 10.29207/resti.v5i1.2700.




DOI: http://dx.doi.org/10.30998/string.v6i2.10133

Refbacks



Copyright (c) 2021 Nur Arifin, Ultach Enri, Nina Sulistiyowati

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

 

STRING (Satuan Tulisan Riset dan Inovasi Teknologi) indexed by:



Lisensi Creative Commons
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi 4.0 Internasional.
View My Stats

Flag Counter