• DocumentCode
    624129
  • Title

    Solving unbalanced data for Thai sentiment analysis

  • Author

    Wunnasri, Warunya ; Theeramunkong, Thanaruk ; Haruechaiyasak, Choochart

  • Author_Institution
    Sch. of Inf., Comput. & Commun. Technol., Thammasat Univ., Thailand
  • fYear
    2013
  • fDate
    29-31 May 2013
  • Firstpage
    200
  • Lastpage
    205
  • Abstract
    Growth of microblogging “Twitter” is dramatic among online users in Thailand. Communication on Twitter is very lively and up-to-date since users Users often express their feelings and sentiments in Twitter posts related to current topics or new growing topic. While sentiment analysis on Twitter has challenges in language related issues, such as short-length message and word usage variation, it also faces the problem of unbalanced class problem. In Twitter, people tend to make complaints more than admirations. In this paper, we propose a sampling-based method to solve data unbalanceness in Twitter sentiment analysis in Thai. Three types of sampling methods, called random, largest complete-link sampling, and largest average-link sampling are produced as preprocess before k-NN classifier. From the experimental results, the largest average-linkage sampling achieves the highest performance with the macro average F-measure of 0.57 comparing to the unbalance case.
  • Keywords
    natural language processing; pattern classification; random processes; sampling methods; social networking (online); text analysis; Thai sentiment analysis; Twitter posts; data unbalanceness; k-NN classifier; largest average-link sampling method; largest complete-link sampling method; macro average F-measure; microblogging; online users; random method; sampling-based method; short-length message; unbalanced class problem; unbalanced data; word usage variation; Couplings; Feature extraction; Mobile communication; Support vector machine classification; Training; Training data; Twitter; K-Nearest Neighbor; Resizing Training Dataset; Sentiment Analysis; Social Media Content; Unbalance Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering (JCSSE), 2013 10th International Joint Conference on
  • Conference_Location
    Maha Sarakham
  • Print_ISBN
    978-1-4799-0805-9
  • Type

    conf

  • DOI
    10.1109/JCSSE.2013.6567345
  • Filename
    6567345