Content Based Spam Email Classification using Supervised SVM, Decision Trees and Naive Bayes

Konferenz: ICMLCA 2021 - 2nd International Conference on Machine Learning and Computer Application
17.12.2021 - 19.12.2021 in Shenyang, China

Tagungsband: ICMLCA 2021

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Cui, Jiaqi; Li, Xiaoxi (College of Computer Science Sichuan University Chengdu, China)

Inhalt:
The prevalence of spam emails has generated an urgent need for anti-spam filters. Many algorithms have been proposed to classify spam emails. In this paper, we create a supervised classification pipeline to classify emails as spam or legitimate and evaluate three machine learning algorithms (SVMs, decision trees, Naive Bayes) for spam email classification. One of the main steps in spam email classification is feature selection. We implement Term Frequency Inverse Document Frequency (TF-IDF) and choose the top-20 most frequently used words in spam and legitimate emails. We conduct experiments on SVMs, decision trees, Naive Bayes with selected features and evaluate their capability and performance in spam email detection and classification.