Title :
Personalized Spam Filtering with Natural Language Attributes
Author :
Shams, Reza ; Mercer, Robert E.
Author_Institution :
Dept. of Comput. Sci., Univ. of Western Ontario, London, ON, Canada
Abstract :
Email spam is one of the biggest threats to today´s Internet. To deal with this threat, many anti-spam filters have been developed. One big challenge for these filters is to predict the labels of emails in a personalized mailbox. In this paper, we report the performance of an anti-spam filter named Sentinel. In addition to some commonplace attributes, Sentinel uses attributes related to natural language stylometry. The filter has been tested with six benchmark datasets in the Enron-Spam collection. Classifiers generated by well-known meta-learning algorithms like AdaBoostM1 and Bagging perform equally the best, while a Random Forest (RF) generated classifier performs almost as well. The performance of classifiers using Support Vector Machine (SVM) and Naive Bayes (NB) are not satisfactory. Comparisons show that the performance of Sentinel surpasses that of a number of state-of-the-art personalized filters proposed in previous studies.
Keywords :
Internet; e-mail filters; learning (artificial intelligence); natural language processing; random processes; unsolicited e-mail; ADABOOSTM1; BAGGING; Enron-Spam collection; Internet; SENTINEL; anti-spam filter; antispam filter; emails label prediction; meta learning algorithm; natural language attributes; natural language stylometry; personalized mailbox; personalized spam filtering; random forest; Indexes; Natural languages; Niobium; Radio frequency; Support vector machines; Unsolicited electronic mail; Spam filtering; machine learning application; natural language attributes; performance evaluation; text categorization;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2013 12th International Conference on
Conference_Location :
Miami, FL
DOI :
10.1109/ICMLA.2013.117