• DocumentCode
    177914
  • Title

    Handling Imbalanced Datasets by Partially Guided Hybrid Sampling for Pattern Recognition

  • Author

    Sandhan, T. ; Jin Young Choi

  • Author_Institution
    Dept. of Electr. & Comput. Eng., ASRI Seoul Nat. Univ., Seoul, South Korea
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    1449
  • Lastpage
    1453
  • Abstract
    Occurrence of high imbalance in real-world domains is a direct result of rarity of interesting events, which results in skewed datasets. Without dataset rebalancing, the learning algorithm will encounter extremely low minority class samples therefore it gets biased towards the majority class in the classification tasks. Hence properly handling the imbalanced dataset is a crucial issue in the pattern recognition domain. We have employed bootstrapping by simultaneous oversampling of the minority class and under sampling of the majority class to build the ensemble of classifiers. Oversampling is partially guided by the extracted hidden patterns from minority class, which prevents its over-generalization and amplify subtle vital patterns. The proposed framework is evaluated on four highly imbalanced datasets with employing a series of classifiers like, support vector machine, logistic regression, nearest neighbor and Gaussian process classifier. Experimental results showed that the pattern classification performance for various tasks improves after rebalancing datasets using the proposed framework.
  • Keywords
    Gaussian processes; learning (artificial intelligence); pattern classification; regression analysis; sampling methods; support vector machines; Gaussian process classifier; classification tasks; dataset rebalancing; extracted hidden patterns; extremely low minority class samples; imbalanced datasets; learning algorithm; logistic regression; nearest neighbor; partially guided hybrid sampling; pattern recognition; real-world domains; skewed datasets; support vector machine; Databases; Gaussian processes; Kernel; Pattern recognition; Proteins; Support vector machines; Vectors; Imbalanced dataset; Sat-image classification; bootstrapping; ensemble classifier; medical diagnoses; protein classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.258
  • Filename
    6976968