• DocumentCode
    3756307
  • Title

    Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification

  • Author

    Sheng Guan;Min Chen;Hsin-Yu Ha;Shu-Ching Chen;Mei-Ling Shyu;Chengde Zhang

  • Author_Institution
    Sch. of Comput. &
  • fYear
    2015
  • Firstpage
    288
  • Lastpage
    295
  • Abstract
    In this paper, we propose an extended deep learning approach that incorporates instance selection and bootstrapping techniques for imbalanced data classification. In supervised learning, classification performance often deteriorates when the training set is imbalanced where at least one of the classes has a substantially fewer number of instances than the others. We propose to use adaptive synthetic sampling approach (ADASYN) to generate synthetic instances for the minority class. A data pruning process based on multiple correspondence analysis (MCA) is then performed to identify a sub-set of synthetic instances that are most suitable to supplement the existing minority instances. This results in a relatively more balanced training dataset which is then bootstrapped and fed into the convolutional neural networks (CNNs) for classification. Furthermore, we propose to use low-level features pre-processed by principal component analysis (PCA), instead of the commonly used raw signal data, as the input to CNNs to reduce the computational time. The experimental results show the effectiveness of our framework in classifying 54 TRECVID concepts with different imbalanced levels by comparing with other state-of-the-art methods.
  • Keywords
    "Training","Feature extraction","Machine learning","Neurons","Multimedia communication","Biological neural networks","Principal component analysis"
  • Publisher
    ieee
  • Conference_Titel
    Collaboration and Internet Computing (CIC), 2015 IEEE Conference on
  • Type

    conf

  • DOI
    10.1109/CIC.2015.40
  • Filename
    7423094