• DocumentCode
    2291831
  • Title

    SkewBoost: An algorithm for classifying imbalanced datasets

  • Author

    Hukerikar, Saumil ; Tumma, Ashwin ; Nikam, Akshay ; Attar, Vahida

  • Author_Institution
    Dept. of Comput. Eng. & Inf. Technol., Coll. of Eng. Pune, Pune, India
  • fYear
    2011
  • fDate
    15-17 Sept. 2011
  • Firstpage
    46
  • Lastpage
    52
  • Abstract
    Many real world data sets have an imbalanced distribution of the instances. Learning from such data sets results in the classifier being biased towards the majority class, thereby tending to misclassify the minority class samples. In this paper, we provide a technique, SkewBoost which classifies the minority instances correctly without compromising much on the correct classification of the majority instances. In the SkewBoost technique, minority and majority instances are identified during execution of the boosting algorithm. A variation of SMOTE is used to create synthetic minority instances which are then added to the training set and total weight is rebalanced. After each iteration of the boosting algorithm, the weight of each instance is modified to focus more on the misclassified instances. A cost-sensitive approach has been adopted to reweight the instances following every iteration. This method is evaluated, in terms of the F-measure, G-mean, AUC, Recall and Precision, on imbalanced data sets against the results that have been published in the previous publications of algorithms on imbalanced datasets.
  • Keywords
    data handling; iterative methods; learning (artificial intelligence); pattern classification; AUC; F-measure; G-mean; SMOTE variation; SkewBoost technique; boosting algorithm; cost-sensitive approach; imbalanced dataset classification algorithm; precision; recall; synthetic minority instances; Accuracy; Boosting; Classification algorithms; Data mining; Measurement; boosting; instance weights; minority class; over sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Communication Technology (ICCCT), 2011 2nd International Conference on
  • Conference_Location
    Allahabad
  • Print_ISBN
    978-1-4577-1385-9
  • Type

    conf

  • DOI
    10.1109/ICCCT.2011.6075185
  • Filename
    6075185