• DocumentCode
    1905783
  • Title

    Improved Self-Generating Prototypes Algorithm for Imbalanced Datasets

  • Author

    Oliveira, D.V.R. ; Magalhaes, G.R. ; Cavalcanti, G.D.C. ; Ren, T.I.

  • Author_Institution
    Center for Inf., Fed. Univ. of Pernambuco, Recife, Brazil
  • Volume
    1
  • fYear
    2012
  • fDate
    7-9 Nov. 2012
  • Firstpage
    904
  • Lastpage
    909
  • Abstract
    Some real world datasets have different proportions of classes, too many instances of the majority classes and only a few of the minority classes, those are called imbalanced datasets. Many applications, like medical diagnosis and risk analysis, are interested in the under-represented class, but classifiers and prototype generation techniques usually have a bias towards the majority classes. Because of that, the problem of classification with imbalanced datasets has become an important topic in Pattern Recognition. The Self-Generating Prototypes (SGP) have a high reduction power and an excellent performance with balanced datasets, but, with imbalanced datasets, the generated prototypes do not have a good representation of the training dataset. This algorithm generates many prototypes of the majority classes and only a few, or even none, of the minority classes. The aim of this paper is to propose the Adaptive Self-Generating Prototypes (ASGP), an improvement of the SGP2, the second version of the SGP, designed to handle imbalanced datasets. This paper also exposes the reasons for the low performance of the SGP2 with such datasets. Empirical results show that the ASGP has a higher performance with imbalanced datasets than the SGP2, especially when it comes to classification accuracy of the minority classes.
  • Keywords
    data mining; pattern classification; ASGP; SGP2; adaptive self-generating prototype algorithm; classification accuracy; classifier; imbalanced dataset; majority class; medical diagnosis; minority class; pattern recognition; real world dataset; reduction power; risk analysis; training dataset; under-represented class; Accuracy; Algorithm design and analysis; Medical diagnosis; Noise; Prediction algorithms; Prototypes; Training; Adaptive Self-Generating Prototypes ASGP); Classification; Imbalanced Datasets; Prototype Generation (PG); Self-Generating Prototypes (SGP);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
  • Conference_Location
    Athens
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4799-0227-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2012.126
  • Filename
    6495140