DocumentCode :
2541260
Title :
Spam Filtering Based on Improved CHI Feature Selection Method
Author :
Lu, Zhimao ; Yu, Hongxia ; Fan, Dongmei ; Yuan, Chaoyue
Author_Institution :
Pattern Recognition & Natural Comput. Lab., Harbin Eng. Univ., Harbin, China
fYear :
2009
fDate :
4-6 Nov. 2009
Firstpage :
1
Lastpage :
3
Abstract :
In this paper, methods of feature selection used in the spam filtering are studied, including CHI square (CHI), Expected Cross Entropy (ECE), the Weight of Evidence for Text (WET) and Information Gain (IG) and a novel modified CHI feature selection method is proposed in spam filtering. The spam filter combined Support Vector Machine (SVM) is selected to evaluate the CHI square, Expected Cross Entropy, the Weight of Evidence for Text, Information Gain and modified CHI. The experiment proved that the modified CHI could improve the precision, recall and F test measure of spam filter and the modified CHI feature selection method is effective.
Keywords :
entropy; feature extraction; information filtering; statistical testing; support vector machines; text analysis; unsolicited e-mail; CHI feature selection method; CHI square method; ECE method; F test measure; IG; SVM; WET method; expected cross entropy method; information gain; spam filtering; support vector machine; weight-of-evidence-for-text method; Chaos; Entropy; Hilbert space; Information filtering; Information filters; Internet; Pattern recognition; Support vector machines; Testing; Unsolicited electronic mail;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2009. CCPR 2009. Chinese Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4199-0
Type :
conf
DOI :
10.1109/CCPR.2009.5344010
Filename :
5344010
Link To Document :
بازگشت