Title :
Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition
Author :
Sandhan, T. ; Jin Young Choi
Author_Institution :
Dept. of Electr. & Comput. Eng., ASRI Seoul Nat. Univ., Seoul, South Korea
Abstract :
Occurrence of high imbalance in real-world domains is a direct result of rarity of interesting events, which results in skewed datasets. Without dataset rebalancing, the learning algorithm will encounter extremely low minority class samples therefore it gets biased towards the majority class in the classification tasks. Hence properly handling the imbalanced dataset is a crucial issue in the pattern recognition domain. We have employed bootstrapping by simultaneous oversampling of the minority class and under sampling of the majority class to build the ensemble of classifiers. Oversampling is partially guided by the extracted hidden patterns from minority class, which prevents its over-generalization and amplify subtle vital patterns. The proposed framework is evaluated on four highly imbalanced datasets with employing a series of classifiers like, support vector machine, logistic regression, nearest neighbor and Gaussian process classifier. Experimental results showed that the pattern classification performance for various tasks improves after rebalancing datasets using the proposed framework.
Keywords :
Gaussian processes; learning (artificial intelligence); pattern classification; regression analysis; sampling methods; support vector machines; Gaussian process classifier; classification tasks; dataset rebalancing; extracted hidden patterns; extremely low minority class samples; imbalanced datasets; learning algorithm; logistic regression; nearest neighbor; partially guided hybrid sampling; pattern recognition; real-world domains; skewed datasets; support vector machine; Databases; Gaussian processes; Kernel; Pattern recognition; Proteins; Support vector machines; Vectors; Imbalanced dataset; Sat-image classification; bootstrapping; ensemble classifier; medical diagnoses; protein classification;
Conference_Titel :
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location :
Stockholm
DOI :
10.1109/ICPR.2014.258