Title :
A hybrid coupled k-nearest neighbor algorithm on imbalance data
Author :
Chunming Liu ; Longbing Cao ; Yu, Philip S.
Author_Institution :
Adv. Analytics Inst., Univ. of Sydney Technol., Sydney, NSW, Australia
Abstract :
The state-of-the-art classification algorithms rarely consider the relationship between the attributes in the data sets and assume the attributes are independently to each other (IID). However, in real-world data, these attributes are more or less interacted via explicit or implicit relationships. Although the classifiers for class-balanced data are relatively well developed, the classification of class-imbalanced data is not straightforward, especially for mixed type data which has both categorical and numerical features. Limited research has been conducted on the class-imbalanced data. Some algorithms mainly synthesize or remove instances to force the sizes of each class comparable, which may change the inherent data structure or introduces noise to the source data. While for the distance or similarity based algorithms, they ignored the relationship between features when computing the similarity. This paper proposes a hybrid coupled k-nearest neighbor classification algorithm (HC-kNN) for mixed type data, by doing discretization on numerical features to adapt the inter coupling similarity as we do on categorical features, then combing this coupled similarity to the original similarity or distance, to overcome the shortcoming of the previous algorithms. The experiment results demonstrate that our proposed algorithm can get a higher average performance than that of the relevant algorithms (e.g. the variants of kNN, Decision Tree, SMOTE and NaiveBayes).
Keywords :
pattern classification; HC-kNN; IID; categorical features; class-balanced data; class-imbalanced data; classifier; data structure; distance based algorithm; explicit relationship; hybrid coupled k-nearest neighbor classification algorithm; imbalance data; implicit relationship; mixed type data; numerical features; similarity based algorithm; Algorithm design and analysis; Classification algorithms; Clouds; Couplings; Size measurement; Training; Training data;
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
DOI :
10.1109/IJCNN.2014.6889798