DocumentCode
3696639
Title
Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm
Author
Phakhawat Sarakit;Thanaruk Theeramunkong;Choochart Haruechaiyasak
Author_Institution
School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Thailand
fYear
2015
Firstpage
1
Lastpage
5
Abstract
The imbalanced dataset problem triggers degradation of classification performance in several data mining applications including pattern recognition, text categorization, and information filtering tasks. To improve emotion classification performance, we use a sampling-based algorithm called SMOTE, which oversamples instances in a minority class to the number of those from the majority class. YouTube dataset was balanced using the SMOTE technique and tested using three machine learning algorithms, namely multinomial Naïve Bayes (MNB), decision tree (DT) and support vector machines (SVM). As a result, SVM achieves the highest accuracy with 93.30% on filtering task and 89.44% on classification. The SMOTE technique can solve the imbalanced data problem and obtain an improved classification result.
Keywords
"Support vector machines","Accuracy","Filtering","Classification algorithms","Decision trees","YouTube","Machine learning algorithms"
Publisher
ieee
Conference_Titel
Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 2015 2nd International Conference on
Print_ISBN
978-1-4673-8142-0
Type
conf
DOI
10.1109/ICAICTA.2015.7335373
Filename
7335373
Link To Document