• DocumentCode
    2710497
  • Title

    Diversity exploration and negative correlation learning on imbalanced data sets

  • Author

    Wang, Shuo ; Tang, Ke ; Yao, Xin

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Birmingham, Birmingham, UK
  • fYear
    2009
  • fDate
    14-19 June 2009
  • Firstpage
    3259
  • Lastpage
    3266
  • Abstract
    Class imbalance learning is an important research area in machine learning, where instances in some classes heavily outnumber the instances in other classes. This unbalanced class distribution causes performance degradation. Some ensemble solutions have been proposed for the class imbalance problem. Diversity has been proved to be an influential aspect in ensemble learning, which describes the degree of different decisions made by classifiers. However, none of those proposed solutions explore the impact of diversity on imbalanced data sets. In addition, most of them are based on re-sampling techniques to rebalance class distribution, and over-sampling usually causes overfitting (high generalisation error). This paper investigates if diversity can relieve this problem by using negative correlation learning (NCL) model, which encourages diversity explicitly by adding a penalty term in the error function of neural networks. A variation model of NCL is also proposed - NCLCost. Our study shows that diversity has a direct impact on the measure of recall. It is also a factor that causes the reduction of F-measure. In addition, although NCL-based models with extreme settings do not produce better recall values of minority class than SMOTEBoost [1], they have slightly better performance of F-measure and G-mean than both independent ANNs and SMOTEBoost and better recall than independent ANNs.
  • Keywords
    learning (artificial intelligence); neural nets; pattern classification; sampling methods; NCL; class imbalance problem; ensemble learning; imbalanced data set; machine learning; negative correlation learning; neural network; pattern classification; re-sampling technique; unbalanced class distribution; variation model; Bagging; Boosting; Classification tree analysis; Decision trees; Degradation; Intrusion detection; Learning systems; Machine learning; Neural networks; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2009. IJCNN 2009. International Joint Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-3548-7
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2009.5178836
  • Filename
    5178836