Title :
Solving unbalanced data for Thai sentiment analysis
Author :
Wunnasri, Warunya ; Theeramunkong, Thanaruk ; Haruechaiyasak, Choochart
Author_Institution :
Sch. of Inf., Comput. & Commun. Technol., Thammasat Univ., Thailand
Abstract :
Growth of microblogging “Twitter” is dramatic among online users in Thailand. Communication on Twitter is very lively and up-to-date since users Users often express their feelings and sentiments in Twitter posts related to current topics or new growing topic. While sentiment analysis on Twitter has challenges in language related issues, such as short-length message and word usage variation, it also faces the problem of unbalanced class problem. In Twitter, people tend to make complaints more than admirations. In this paper, we propose a sampling-based method to solve data unbalanceness in Twitter sentiment analysis in Thai. Three types of sampling methods, called random, largest complete-link sampling, and largest average-link sampling are produced as preprocess before k-NN classifier. From the experimental results, the largest average-linkage sampling achieves the highest performance with the macro average F-measure of 0.57 comparing to the unbalance case.
Keywords :
natural language processing; pattern classification; random processes; sampling methods; social networking (online); text analysis; Thai sentiment analysis; Twitter posts; data unbalanceness; k-NN classifier; largest average-link sampling method; largest complete-link sampling method; macro average F-measure; microblogging; online users; random method; sampling-based method; short-length message; unbalanced class problem; unbalanced data; word usage variation; Couplings; Feature extraction; Mobile communication; Support vector machine classification; Training; Training data; Twitter; K-Nearest Neighbor; Resizing Training Dataset; Sentiment Analysis; Social Media Content; Unbalance Data;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2013 10th International Joint Conference on
Conference_Location :
Maha Sarakham
Print_ISBN :
978-1-4799-0805-9
DOI :
10.1109/JCSSE.2013.6567345