DocumentCode
177914
Title
Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition
Author
Sandhan, T. ; Jin Young Choi
Author_Institution
Dept. of Electr. & Comput. Eng., ASRI Seoul Nat. Univ., Seoul, South Korea
fYear
2014
fDate
24-28 Aug. 2014
Firstpage
1449
Lastpage
1453
Abstract
Occurrence of high imbalance in real-world domains is a direct result of rarity of interesting events, which results in skewed datasets. Without dataset rebalancing, the learning algorithm will encounter extremely low minority class samples therefore it gets biased towards the majority class in the classification tasks. Hence properly handling the imbalanced dataset is a crucial issue in the pattern recognition domain. We have employed bootstrapping by simultaneous oversampling of the minority class and under sampling of the majority class to build the ensemble of classifiers. Oversampling is partially guided by the extracted hidden patterns from minority class, which prevents its over-generalization and amplify subtle vital patterns. The proposed framework is evaluated on four highly imbalanced datasets with employing a series of classifiers like, support vector machine, logistic regression, nearest neighbor and Gaussian process classifier. Experimental results showed that the pattern classification performance for various tasks improves after rebalancing datasets using the proposed framework.
Keywords
Gaussian processes; learning (artificial intelligence); pattern classification; regression analysis; sampling methods; support vector machines; Gaussian process classifier; classification tasks; dataset rebalancing; extracted hidden patterns; extremely low minority class samples; imbalanced datasets; learning algorithm; logistic regression; nearest neighbor; partially guided hybrid sampling; pattern recognition; real-world domains; skewed datasets; support vector machine; Databases; Gaussian processes; Kernel; Pattern recognition; Proteins; Support vector machines; Vectors; Imbalanced dataset; Sat-image classification; bootstrapping; ensemble classifier; medical diagnoses; protein classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition (ICPR), 2014 22nd International Conference on
Conference_Location
Stockholm
ISSN
1051-4651
Type
conf
DOI
10.1109/ICPR.2014.258
Filename
6976968
Link To Document