Title of article :
Comparison of metrics for feature selection in imbalanced text classification
Author/Authors :
Ogura، نويسنده , , Hiroshi and Amano، نويسنده , , Hiromi and Kondo، نويسنده , , Masato، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Pages :
12
From page :
4978
To page :
4989
Abstract :
Class imbalance problems are often encountered in real applications of automatic text classifications especially at the so-called “one-against-all” settings and thus handling the problem with satisfactory performance is substantially important. In this paper, we focus our attention on a feature selection scheme for solving this problem and explore the abilities and characteristics of various metrics for feature selection. We examine three different types of metrics; Type-I: χ P 2 and Gini index, Type-II: χ2 and information gain and Type-III: signed χ2 and signed information gain. Type-I and Type-II metrics implicitly combine positive and negative features which indicate the membership and nonmembership of positive class, respectively. Type-III metrics were utilized in the combination framework in which the positive and negative features are explicitly combined and the degree of combination is optimized to improve the performance at imbalanced situations. Our experimental results show that feature selections using Type-I metrics on imbalanced data set achieve the comparable classification performances with those of the combination framework using Type-III metrics and proved to be much more superior to those of Type-II metrics. This result indicates that Type-I metrics serve as more simplified alternative methods for the combination framework. The characteristic behaviors and the performance of each of the used metrics are also investigated closely in terms of the distribution and quality of selected features.
Keywords :
Poisson Distribution , K-nn Classifier , Imbalanced data , Combination framework , Text classification , feature selection
Journal title :
Expert Systems with Applications
Serial Year :
2011
Journal title :
Expert Systems with Applications
Record number :
2349167
Link To Document :
بازگشت