• DocumentCode
    3696639
  • Title

    Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm

  • Author

    Phakhawat Sarakit;Thanaruk Theeramunkong;Choochart Haruechaiyasak

  • Author_Institution
    School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Thailand
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The imbalanced dataset problem triggers degradation of classification performance in several data mining applications including pattern recognition, text categorization, and information filtering tasks. To improve emotion classification performance, we use a sampling-based algorithm called SMOTE, which oversamples instances in a minority class to the number of those from the majority class. YouTube dataset was balanced using the SMOTE technique and tested using three machine learning algorithms, namely multinomial Naïve Bayes (MNB), decision tree (DT) and support vector machines (SVM). As a result, SVM achieves the highest accuracy with 93.30% on filtering task and 89.44% on classification. The SMOTE technique can solve the imbalanced data problem and obtain an improved classification result.
  • Keywords
    "Support vector machines","Accuracy","Filtering","Classification algorithms","Decision trees","YouTube","Machine learning algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 2015 2nd International Conference on
  • Print_ISBN
    978-1-4673-8142-0
  • Type

    conf

  • DOI
    10.1109/ICAICTA.2015.7335373
  • Filename
    7335373