DocumentCode :
2330365
Title :
Extracting discriminative information from e-mail for spam detection inspired by Immune System
Author :
Zhu, Yuanchun ; Tan, Ying
Author_Institution :
Dept. of Machine Intell., Peking Univ., Beijing, China
fYear :
2010
fDate :
18-23 July 2010
Firstpage :
1
Lastpage :
7
Abstract :
Inspired from Biological Immune System, we propose a local concentration based feature extraction (LC) approach for anti-spam. A general anti-spam model is built to incorporate the LC approach with term selection methods and classifiers. In the LC model, each message is divided into areas by a sliding window. At each area, a two-dimensional feature is constructed by calculating the concentrations of spam and legitimate email. Then all the features of each area are combined together as a whole feature vector. Several experiments are conducted on four benchmark corpora, by using 10-fold cross-validation. It is shown that the LC approach can extract the effective position correlated information from messages. Compared to the prevalent Bag-of-Words approach, the LC has better performance in terms of both accuracy and F1 measure. Most significantly, the LC approach can reduce feature dimensionality greatly and has much faster speed.
Keywords :
feature extraction; unsolicited e-mail; 10-fold cross-validation; anti-spam model; biological immune system; discriminative information extraction; e-mail; feature extraction; local concentration; spam detection; term selection method; Accuracy; Construction industry; Electronic mail; Feature extraction; Libraries; Pathogens; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6909-3
Type :
conf
DOI :
10.1109/CEC.2010.5586290
Filename :
5586290
Link To Document :
بازگشت