DocumentCode :
2308476
Title :
Sentiment classification for Indonesian message in social media
Author :
Naradhipa, Aqsath Rasyid ; Purwarianti, Ayu
Author_Institution :
Sch. of Electr. & Inf. Engineerng, Bandung Inst. of Technol., Bandung, Indonesia
fYear :
2012
fDate :
26-27 April 2012
Firstpage :
1
Lastpage :
5
Abstract :
Nowadays, classifying sentiment from social media has been a strategic thing since people can express their feeling about something in an easy way and short text. Mining opinion from social media has become important because people are usually honest with their feeling on something. In our research, we tried to identify the problems of classifying sentiment from Indonesian social media. We identified that people tend to express their opinion in text while the emoticon is rarely used and sometimes misleading. We also identified that the Indonesian social media opinion can be classified not only to positive, negative, neutral and question but also to a special mix case between negative and question type. Basically there are two levels of problem: word level and sentence level. Word level problems include the usage of punctuation mark, the number usage to replace letter, misspelled word and the usage of nonstandard abbreviation. In sentence level, the problem is related with the sentiment type such as mentioned before. In our research, we built a sentiment classification system which includes several steps such as text preprocessing, feature extraction, and classification. The text preprocessing aims to transform the informal text into formal text. The word formalization method in that we use is the deletion of punctuation mark, the tokenization, conversion of number to letter, the reduction of repetition letter, and using corpus with Levensthein to formalize abbreviation. The sentence formalization method that we use is negation handling, sentiment relative, and affixes handling. Rule-based, SVM and Maximum Entropy are used as the classification algorithms with features of count of positive, negative, and question word in sentence and bigram. From our experimental result, the best classification method is SVM that yields 83.5% accuracy.
Keywords :
classification; natural language processing; text analysis; Indonesian message; Indonesian social media opinion; SVM; bigram; emoticon; feature extraction; informal text; maximum entropy; mining opinion; misspelled word; negation handling; nonstandard abbreviation; number usage; punctuation mark; sentence formalization; sentence level; sentiment classification system; text preprocessing; tokenization; word formalization method; word level problems; Accuracy; Companies; Dictionaries; Entropy; Media; Noise measurement; Support vector machines; Machine Learning; Sentiment Classification; Social Media;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing and Social Networking (ICCCSN), 2012 International Conference on
Conference_Location :
Bandung, West Java
Print_ISBN :
978-1-4673-1815-0
Type :
conf
DOI :
10.1109/ICCCSN.2012.6215730
Filename :
6215730
Link To Document :
بازگشت