Title :
Effective spam classification based on meta-heuristics
Author :
Yeh, Chi-yuan ; Wu, Chih-Hung ; Doong, Shing-Hwang
Author_Institution :
Dept. of Inf. Manage., Shu-Te Univ., Kaohsiung, Taiwan
Abstract :
Using machine learning techniques such as naive Bayes, decision trees and support vector machines to automatically filter out spam e-mails has drawn many researchers´ attention. Previous methods use keywords contained in e-mails to extract binary features from the corpus. However, since keywords of e-mails change from time to time, the performance of keyword-based solution is not stable. In this study, we use behaviors of spammers as the features for classifying e-mails. Such behaviors are first described by meta-heuristics and used as features of e-mails for classification. A total of 113 new features are extracted from the given meta-heuristics. Using existing machine learning techniques, the filtering performance is much better than that using keyword-based filtering. In addition, the training time is substantially reduced because of the low dimensional feature space and sparse feature vectors.
Keywords :
belief networks; decision trees; learning (artificial intelligence); support vector machines; unsolicited e-mail; binary feature extraction; decision trees; keyword-based filtering; low dimensional feature space; machine learning; meta-heuristics; naive Bayes; spam classification; spam e-mail; sparse feature vector; support vector machine; Bayesian methods; Decision trees; Electronic mail; Feature extraction; Genetic programming; Information filtering; Information filters; Machine learning; Support vector machines; Unsolicited electronic mail; Naïve Bayesian; classification; decision trees; machine learning; meta-heuristics; spam; support vector machines;
Conference_Titel :
Systems, Man and Cybernetics, 2005 IEEE International Conference on
Print_ISBN :
0-7803-9298-1
DOI :
10.1109/ICSMC.2005.1571750