• DocumentCode
    40211
  • Title

    A New Strategy of Cost-Free Learning in the Class Imbalance Problem

  • Author

    Xiaowan Zhang ; Bao-Gang Hu

  • Author_Institution
    Nat. Lab. of Pattern Recognition, Inst. of Autom., Beijing, China
  • Volume
    26
  • Issue
    12
  • fYear
    2014
  • fDate
    Dec. 2014
  • Firstpage
    2872
  • Lastpage
    2885
  • Abstract
    In this work, we define cost-free learning (CFL) formally in comparison with cost-sensitive learning (CSL). The main difference between them is that a CFL approach seeks optimal classification results without requiring any cost information, even in the class imbalance problem. In fact, several CFL approaches exist in the related studies, such as sampling and some criteria-based approaches. However, to our best knowledge, none of the existing CFL and CSL approaches are able to process the abstaining classifications properly when no information is given about errors and rejects. Based on information theory, we propose a novel CFL which seeks to maximize normalized mutual information of the targets and the decision outputs of classifiers. Using the strategy, we can handle binary/multi-class classifications with/without abstaining. Significant features are observed from the new strategy. While the degree of class imbalance is changing, the proposed strategy is able to balance the errors and rejects accordingly and automatically. Another advantage of the strategy is its ability of deriving optimal rejection thresholds for abstaining classifications and the “equivalent” costs in binary classifications. The connection between rejection thresholds and ROC curve is explored. Empirical investigation is made on several benchmark data sets in comparison with other existing approaches. The classification results demonstrate a promising perspective of the strategy in machine learning.
  • Keywords
    learning (artificial intelligence); pattern classification; sensitivity analysis; CFL approach; ROC curve; abstaining classification; binary classification; class imbalance problem; cost-free learning; information theory; machine learning; multiclass classification; normalized mutual information; optimal rejection thresholds; Cost function; Learning systems; Mutual information; Optimization; Probabilistic logic; Classification; ROC; abstaining; class imbalance; cost-free learning; cost-sensitive learning; mutual information;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2014.2312336
  • Filename
    6774882