Title :
The Role of Text Pre-processing in Opinion Mining on a Social Media Language Dataset
Author :
Dos Santos, Fernando Leandro ; Ladeira, Marcelo
Author_Institution :
CIC-UnB Univ. of Brasilia, Brasilia, Brazil
Abstract :
This work describes an opinion mining application over a dataset extracted from the web and composed of reviews with several Internet slangs, abbreviations and typo errors. Opinion mining is a study field that tries to identify and classify subjectivity, such as opinions, emotions or sentiments in natural language. In this research, 759.176 Portuguese reviews were extracted from the app store Google Play. Due to the large amount of reviews, large-scale processing techniques were needed, involving powerful frameworks such as Hadoop and Mahout. Based on tests conducted it was concluded that pre-processing has an insignificant role in opinion mining task for the specific domain of reviews of mobile apps. The work also contributed to the creation of a corpus consisting of 759 thousand reviews and a dictionary of slangs and abbreviations commonly used in the Internet.
Keywords :
Internet; data mining; mobile computing; natural language processing; text analysis; Google Play app store; Hadoop; Internet abbreviations; Internet slangs; Internet typo errors; Mahout; Portuguese reviews; mobile app reviews; opinion mining; social media language dataset; text preprocessing; Data mining; Dictionaries; Internet; Logistics; Matrix converters; Sentiment analysis; Support vector machines; large-scale data processing; opinion mining; sentiment analysis; text mininig; text pre-processing;
Conference_Titel :
Intelligent Systems (BRACIS), 2014 Brazilian Conference on
Conference_Location :
Sao Paulo
DOI :
10.1109/BRACIS.2014.20