• DocumentCode
    1798091
  • Title

    Coupled fuzzy k-nearest neighbors classification of imbalanced non-IID categorical data

  • Author

    Chunming Liu ; Longbing Cao ; Yu, Philip S.

  • Author_Institution
    Adv. Analytics Inst., Univ. of Sydney Technol., Sydney, NSW, Australia
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    1122
  • Lastpage
    1129
  • Abstract
    Mining imbalanced data has recently received increasing attention due to its challenge and wide applications in the real world. Most of the existing work focuses on numerical data by manipulating the data structure which essentially changes the data characteristics or developing new distance or similarity measures which are designed for data with the so-called IID assumption, namely data is independent and identically distributed. This is not consistent with the real-life data and business needs, which request to fully respect the data structure and coupling relationships embedded in data objects, features and feature values. In this paper, we propose a novel coupled fuzzy similarity-based classification approach to cater for the difference between classes by a fuzzy membership and the couplings by coupled object similarity, and incorporate them into the most popular classifier: kNN to form a coupled fuzzy kNN (ie. CF-kNN). We test the approach on 14 categorical data sets compared to several kNN variants and classic classifiers including C4.5 and NaiveBayes. The experimental results show that CF-kNN outperforms the baselines, and those classifiers incorporated with the proposed coupled fuzzy similarity perform better than their original editions.
  • Keywords
    data mining; fuzzy set theory; pattern classification; CF-kNN; categorical data sets; classic classifier; coupled fuzzy k-nearest neighbor classification; coupled fuzzy kNN; coupled object similarity; coupling relationships; data characteristics; data objects; data structure manipulation; distance measures; feature values; fuzzy membership; fuzzy similarity-based classification approach; imbalanced data mining; imbalanced nonIID categorical data; numerical data; similarity measures; Algorithm design and analysis; Couplings; Data mining; Distributed databases; Equations; Feature extraction; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889773
  • Filename
    6889773