2024 9th International Conference on Intelligent Computing and Signal Processing (ICSP)
Download PDF

Abstract

Spam classification is a popular topic in modern information security and personal privacy. It not only occupies network resources, leads to network congestion, but also causes economic losses for people. Due to the development of artificial intelligence and information processing technology, content statistics technology based on machine learning and deep learning has greatly improved the efficiency of spam classification. This article focuses on the topic of spam, and we propose the “Spam-7 Category”, which creatively classifies spam into advertising emails, scanning emails, phishing emails, emails spreading malware, pornographic emails, chain emails, and false warnings or notifications emails. We propose the Beijing Jiaotong University Email Spam (BJES) Datasets datasets to address the shortage of Chinese spam datasets. On the basis of studying past spam detection techniques, we propose the LBSVM model (a neural network model based on SVM and BERT) for spam classification tasks. LBSVM achieved accuracy of 98.3% and 90.1 % on the trec06c and BJES datasets, respectively. It was also experimentally compared with Naive Bayes (Gaussian NB), Naive Bayes(MultinomiaINB),SVM, and Logistic Regression algorithms to demonstrate its feasibility and efficiency in spam classification tasks.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles