DocumentCode :
3045137
Title :
Genetic-based feature selection for spam detection
Author :
Arani, Seyyed Hossein Seyyedi ; Mozaffari, Saeed
Author_Institution :
Qazvin Branch, Islamic Azad Univ., Qazvin, Iran
fYear :
2013
fDate :
14-16 May 2013
Firstpage :
1
Lastpage :
6
Abstract :
In recent years, email has evolved into a pervasive and economical means of communication, but spam as an annoying fact has decreased usefulness of this means. For encountering the challenge, email filtering as a special kind of text classification emerged and developed. A main problem in text classification tasks which is more serious in email filtering is existence of large number of features. For solving the issue, various feature selection methods are considered, which extract a lower dimensional feature space from original one and offer it as input to classifier. In this regard, we examined effectiveness of two existent individual methods and offer a new combinational method. The methods, which are experimented individually, are Information Gain (IG) and χ2 statistic (CHI), and our combined method is applying genetic algorithm (GA) on the top features selected by IG. We used Perceptron neural network as classifier. For evaluation of our system, experiments were conducted on PU data set. The results showed that the individual methods are very effective in reducing dimensionality of input space along with increasing performance of classifier, and the combined method further improves performance in spite of bringing dimensionality to a lower extent.
Keywords :
genetic algorithms; multilayer perceptrons; pattern classification; statistical analysis; text analysis; unsolicited e-mail; χ2 statistic; CHI; GA; IG; combinational method; economical communication means; email; genetic algorithm; genetic-based feature selection; information gain; lower dimensional feature space; perceptron neural network; pervasive communication means; spam detection; text classification; Biological cells; Electronic mail; Feature extraction; Filtering; Genetic algorithms; Neural networks; Text categorization; feature selection; genetic algorithm; spam detection; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering (ICEE), 2013 21st Iranian Conference on
Conference_Location :
Mashhad
Type :
conf
DOI :
10.1109/IranianCEE.2013.6599551
Filename :
6599551
Link To Document :
بازگشت