Evaluasi Efektivitas Penggunaan FastText Embedding dan LSTM Networks dalam Deteksi Phishing Email

Sheptianna Healtha Rukiman; Alam Rahmatulloh

doi:10.30998/faktorexacta.v18i2.26769

Evaluasi Efektivitas Penggunaan FastText Embedding dan LSTM Networks dalam Deteksi Phishing Email

Sheptianna Healtha Rukiman⁽¹⁾, Alam Rahmatulloh^(2*)

(1) Informatics Department, Faculty of Engineering, Siliwangi University
(2) Program Studi Informatika, Fakultas Teknik, Universitas Siliwangi
(*) Corresponding Author

Abstract

Phishing emails represent a significant cyber threat, necessitating advanced detection methods. This study evaluates a model combining FastText word embedding and a Long Short-Term Memory (LSTM) neural network to identify these threats. Using a public dataset from Kaggle, the model was trained on 80% of the data and tested on the remaining 20%. The methodology included data preprocessing, vectorization with FastText to capture sub-word information, and sequential pattern recognition using the LSTM architecture. Performance was evaluated using accuracy, precision, recall, and F1-Score, with the model achieving a 92% detection accuracy. Key challenges identified include class imbalance and high computational requirements. Future research could focus on model optimization and data augmentation techniques to further enhance detection performance and address these limitations.

Full Text:

PDF (Indonesian)

References

M. Sharma, Sunil; Sharma, Rahul; Sharma, “Phishing Email Detection in Cyber Security,” 2024.

X. Xia and G. Mogos, “Kalinux and Gophish analysis of phishing emails,” J. Contents Comput., vol. 5, no. 2, pp. 777–794, 2023.

M. Sanap, “PHISHING ATTACKS DETECTION USING MACHINE LEARNING APPROACH,” 2024, pp. 137–140. doi: 10.58532/V3BAAI6P8CH1.

D. Urlamma, M. Supriya, D. Lavanya, and A. Hari Priya, “Detection Of Phishing Websites Using Gradient Boosting Classifier Based On URL,” Iarjset, vol. 11, no. 3, pp. 116–121, 2024, doi: 10.17148/iarjset.2024.11318.

B. B. Gupta, A. Gaurav, J. Wu, V. Arya, and K. T. Chui, “Deep Learning and Big Data Integration with Cuckoo Search Optimization for Robust Phishing Attack Detection,” in ICC 2024-IEEE International Conference on Communications, IEEE, 2024, pp. 1322–1327.

E. Blancaflor, A. H. Calpo, S. J. Cebrian, and F. Siquioco, “A Comprehensive Review of Neural Network-Based Approaches for Predicting Phishing Websites and URLs,” in 2024 5th International Conference on Industrial Engineering and Artificial Intelligence (IEAI), IEEE, 2024, pp. 96–101.

F. A. O. Santos, H. T. Macedo, T. D. Bispo, and C. Zanchettin, “Morphological skip-gram: Replacing fasttext characters n-gram with morphological knowledge,” Intel. Artif., vol. 24, no. 67, pp. 1–17, 2021, doi: 10.4114/intartif.vol24iss67pp1-17.

M. R. Vivek and P. Chandran, “Analysis of Subword based Word Representations Case Study: Fasttext Malayalam,” in 2022 IEEE 19th India Council International Conference (INDICON), IEEE, 2022, pp. 1–6.

R. Siringoringo, J. Jamaluddin, R. Perangin-angin, E. J. G. Harianja, G. Lumbantoruan, and E. N. Purba, “Model Bidirectional Lstm Untuk Pemrosesan Sekuensial Data Teks Spam,” METHOMIKA J. Manaj. Inform. dan Komputerisasi Akunt., vol. 7, no. 2, pp. 265–271, 2023, doi: 10.46880/jmika.vol7no2.pp265-271.

J. Stremmel, B. L. Hill, J. Hertzberg, J. Murillo, L. Allotey, and E. Halperin, “Extend and explain: Interpreting very long language models,” Proc. Mach. Learn. Res., vol. 193, no. Ml, pp. 218–258, 2022.

J. Srivastava and A. Sharan, “Malicious website detection using BorderlineSMOTE2NCR sampling and cost-sensitive ensemble learning,” in International Conference on Data Science and Big Data Analysis, Springer, 2023, pp. 665–675.

R. Wolert and M. Rawski, “Email Phishing Detection with BLSTM and Word Embeddings,” Int. J. Electron. Telecommun., vol. 69, no. 3, pp. 485–491, 2023, doi: 10.24425/ijet.2023.146496.

K. Mangalam and B. Subba, “PhishDetect: A BiLSTM based phishing URL detection framework using FastText embeddings,” in 2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS), IEEE, 2024, pp. 637–641.

N. A. Nijhum, Q. Li, and T. Yang, “HLSTMCNN: A Hybrid Deep Learning Model to Detect Phishing Email,” in 2023 3rd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), IEEE, 2023, pp. 61–66.

C. K. Truong, P. Hao Do, and T. Duc Le, “A comparative analysis of email phishing detection methods: a deep learning perspective,” IET, 2023.

R. Friedman, “Tokenization in the Theory of Knowledge,” Encyclopedia, vol. 3, no. 1, pp. 380–386, 2023, doi: 10.3390/encyclopedia3010024.

S. Kumari, “Text mining and pre-processing methods for social media data extraction and processing,” in Handbook of research on opinion mining and text analytics on literary works and social media, IGI Global, 2022, pp. 22–53.

B. Dai, X. Shen, and J. Wang, “Embedding learning,” J. Am. Stat. Assoc., vol. 117, no. 537, pp. 307–319, 2022.

Y. Sun, N. Chong, and H. Ochiai, “Federated Phish Bowl: LSTM-Based Decentralized Phishing Email Detection,” Conf. Proc. - IEEE Int. Conf. Syst. Man Cybern., vol. 2022-October, pp. 20–25, 2022, doi: 10.1109/SMC53654.2022.9945584.

DOI: http://dx.doi.org/10.30998/faktorexacta.v18i2.26769

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

View Faktor Exacta Stats

Username
Password
Remember me