Title :
Concentration based feature construction approach for spam detection
Author :
Tan, Ying ; Deng, Chao ; Ruan, Guangchen
Author_Institution :
Dept. of Machine Intell., Peking Univ., Beijing, China
Abstract :
Inspired by human immune system, a concentration based feature construction (CFC) approach which utilizes a two-element concentration vector as the feature vector is proposed for spam detection in this paper. In the CFC approach, dasiaselfpsila and dasianon-selfpsila concentrations are constructed by using dasiaselfpsila and dasianon-selfpsila gene libraries, respectively, and subsequently are used to form a vector with two elements of concentrations for characterizing the e-mail efficiently. As a result, the design of classifier actually amounts to establishing a mapping between two real-value inputs and one binary output. The classification of the e-mail is considered as an optimization problem aiming at minimizing a formulated cost function. A clonal particle swarm optimization (CPSO) algorithm proposed by the leading author is also employed for this purpose. Several classifiers including linear discriminant, multi-layer neural networks and support vector machine are used to verify the effectiveness and robustness of the CFC approach. Experimental results demonstrate that the proposed CFC approach not only has a very much fast speed but also gives 97% and 99% of accuracy just using a two-element concentration feature vector on corpus PU1 and Ling, respectively.
Keywords :
classification; e-mail filters; particle swarm optimisation; unsolicited e-mail; clonal particle swarm optimization; concentration based feature construction; e-mail classification; feature vector; human immune system; spam detection; Cost function; Electronic mail; Humans; Immune system; Libraries; Multi-layer neural network; Particle swarm optimization; Robustness; Support vector machine classification; Support vector machines;
Conference_Titel :
Neural Networks, 2009. IJCNN 2009. International Joint Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-3548-7
Electronic_ISBN :
1098-7576
DOI :
10.1109/IJCNN.2009.5178651