• DocumentCode
    3079101
  • Title

    Investigating the Effect of Sampling Methods for Imbalanced Data Distributions

  • Author

    Yen, Show-Jane ; Lee, Yue-Shi ; Lin, Cheng-Han ; Ying, Jia-Ching

  • Volume
    5
  • fYear
    2006
  • fDate
    8-11 Oct. 2006
  • Firstpage
    4163
  • Lastpage
    4168
  • Abstract
    Classification is an important and well-known technique in the field of machine learning, and the training data will significantly influence the classification accuracy. However, the training data in real-world applications often are imbalanced class distribution. It is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose a cluster-based sampling approach for selecting the representative data as training data to improve the classification accuracy and investigate the effect of under-sampling methods in the imbalanced class distribution problem. In the experiments, we evaluate the performances for our cluster-based sampling approach and the other sampling methods in the previous studies.
  • Keywords
    backpropagation; neural nets; pattern classification; pattern clustering; sampling methods; backpropagation neural network; imbalanced data distribution; machine learning; pattern classification; pattern clustering; sampling method; Accuracy; Costs; Credit cards; Cybernetics; Finance; Machine learning; Neural networks; Performance evaluation; Sampling methods; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
  • Conference_Location
    Taipei
  • Print_ISBN
    1-4244-0099-6
  • Electronic_ISBN
    1-4244-0100-3
  • Type

    conf

  • DOI
    10.1109/ICSMC.2006.384787
  • Filename
    4274552