Title :
Research and improvement for feature selection on naive bayes text classifier
Author_Institution :
Adv. Vocational Tech. Coll., Shanghai Univ. Of Eng. Sci., Shanghai, China
Abstract :
An effective feature selection is very important for an classifier. Improved feature selection method can enhance its classifier efficiency in the practical test validates. This paper studies the principle·, merits and limitations of the prevalent feature selection method. Then, the paper adopts two-stage selection modulus which is calculated by the position of paragraph and sentences respectively, and takes feature variance of two phases into consideration. Finally, the paper adopts improved algorithm in Spam Filter categorization, a quite typical text classification. Experiments show that this method works more effectively than only using mutual information method applied in Naïve bayes in selecting those representative features.
Keywords :
Bayes methods; learning (artificial intelligence); pattern classification; text analysis; feature selection method; mutual information method; naive bayes text classifier; spam filter categorization; text classification; two-stage selection modulus; Appraisal; Educational institutions; Electronic mail; Entropy; Frequency; Machine learning; Machine learning algorithms; Testing; Text categorization; Unsolicited electronic mail; Naive Bayes classifier; feature selection; machine Learning; text classification;
Conference_Titel :
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5821-9
DOI :
10.1109/ICFCC.2010.5497362