DocumentCode
2824211
Title
Research and improvement for feature selection on naive bayes text classifier
Author
Guo Qiang
Author_Institution
Adv. Vocational Tech. Coll., Shanghai Univ. Of Eng. Sci., Shanghai, China
Volume
2
fYear
2010
fDate
21-24 May 2010
Abstract
An effective feature selection is very important for an classifier. Improved feature selection method can enhance its classifier efficiency in the practical test validates. This paper studies the principle·, merits and limitations of the prevalent feature selection method. Then, the paper adopts two-stage selection modulus which is calculated by the position of paragraph and sentences respectively, and takes feature variance of two phases into consideration. Finally, the paper adopts improved algorithm in Spam Filter categorization, a quite typical text classification. Experiments show that this method works more effectively than only using mutual information method applied in Naïve bayes in selecting those representative features.
Keywords
Bayes methods; learning (artificial intelligence); pattern classification; text analysis; feature selection method; mutual information method; naive bayes text classifier; spam filter categorization; text classification; two-stage selection modulus; Appraisal; Educational institutions; Electronic mail; Entropy; Frequency; Machine learning; Machine learning algorithms; Testing; Text categorization; Unsolicited electronic mail; Naive Bayes classifier; feature selection; machine Learning; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5821-9
Type
conf
DOI
10.1109/ICFCC.2010.5497362
Filename
5497362
Link To Document