Title :
An Improved X2 (CHI) Statistics Method for Text Feature Selection
Author :
Yan, Tang ; Ting, Xiao
Author_Institution :
Coll. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
Abstract :
Feature selection is a hot topic in current search field, especially in the field of text categorization. To overcome the shortcomings of traditional χ2 (CHI) approach, an improved χ2 (CHI) statistics method is proposed in this paper. It comprehensively takes criterions such as Document Frequency and Class Accuracy of the traditional statistical methods to improve χ2 (CHI) statistical method. The experiments results show that the proposed method is more effective than the traditional χ2 (CHI) method.
Keywords :
data mining; statistical analysis; 2 CHI statistics method; class accuracy criterion; document frequency criterion; feature selection; text categorization; Data mining; Educational institutions; Entropy; Frequency; Information science; Mutual information; Statistical analysis; Statistics; Text categorization; Text mining;
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
DOI :
10.1109/CISE.2009.5366401