• DocumentCode
    3154618
  • Title

    Clustering-based binary-class classification for imbalanced data sets

  • Author

    Chen, Chao ; Shyu, Mei-Ling

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Miami, Coral Gables, FL, USA
  • fYear
    2011
  • fDate
    3-5 Aug. 2011
  • Firstpage
    384
  • Lastpage
    389
  • Abstract
    In this paper, we propose a new clustering-based binary-class classification framework that integrates the clustering technique into a binary-class classification approach to handle the imbalanced data sets. A binary-class classifier is designed to classify a set of data instances into two classes; while the clustering technique partitions the data instances into groups according to their similarity to each other. After applying a clustering algorithm, the data instances within the same group usually have a higher similarity, and the differences among the data instances between different groups should be larger. In our proposed framework, all negative data instances are first clustered into a set of negative groups. Next, the negative data instances in each negative group are combined with all positive data instances to construct a balanced binary-class data set. Finally, subspace models trained on these balanced binary-class data sets are integrated with the subspace model trained on the original imbalanced data set to form the proposed classification model. Experimental results demonstrate that our proposed classification framework performs better than the comparative classification approaches as well as the subspace modeling method trained on the original data set alone.
  • Keywords
    learning (artificial intelligence); pattern classification; pattern clustering; binary-class classifier; clustering-based binary-class classification; imbalanced data sets; negative data instances; positive data instances; subspace models; Data models; Optimized production technology; Support vector machines; Testing; Training; Training data; Videos; Binary classification; Clustering; Imbalanced data sets; Subspace Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2011 IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4577-0964-7
  • Electronic_ISBN
    978-1-4577-0965-4
  • Type

    conf

  • DOI
    10.1109/IRI.2011.6009578
  • Filename
    6009578