Title :
Spam Filtering Based on Improved CHI Feature Selection Method
Author :
Lu, Zhimao ; Yu, Hongxia ; Fan, Dongmei ; Yuan, Chaoyue
Author_Institution :
Pattern Recognition & Natural Comput. Lab., Harbin Eng. Univ., Harbin, China
Abstract :
In this paper, methods of feature selection used in the spam filtering are studied, including CHI square (CHI), Expected Cross Entropy (ECE), the Weight of Evidence for Text (WET) and Information Gain (IG) and a novel modified CHI feature selection method is proposed in spam filtering. The spam filter combined Support Vector Machine (SVM) is selected to evaluate the CHI square, Expected Cross Entropy, the Weight of Evidence for Text, Information Gain and modified CHI. The experiment proved that the modified CHI could improve the precision, recall and F test measure of spam filter and the modified CHI feature selection method is effective.
Keywords :
entropy; feature extraction; information filtering; statistical testing; support vector machines; text analysis; unsolicited e-mail; CHI feature selection method; CHI square method; ECE method; F test measure; IG; SVM; WET method; expected cross entropy method; information gain; spam filtering; support vector machine; weight-of-evidence-for-text method; Chaos; Entropy; Hilbert space; Information filtering; Information filters; Internet; Pattern recognition; Support vector machines; Testing; Unsolicited electronic mail;
Conference_Titel :
Pattern Recognition, 2009. CCPR 2009. Chinese Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4199-0
DOI :
10.1109/CCPR.2009.5344010