• DocumentCode
    189136
  • Title

    The Role of Text Pre-processing in Opinion Mining on a Social Media Language Dataset

  • Author

    Dos Santos, Fernando Leandro ; Ladeira, Marcelo

  • Author_Institution
    CIC-UnB Univ. of Brasilia, Brasilia, Brazil
  • fYear
    2014
  • fDate
    18-22 Oct. 2014
  • Firstpage
    50
  • Lastpage
    54
  • Abstract
    This work describes an opinion mining application over a dataset extracted from the web and composed of reviews with several Internet slangs, abbreviations and typo errors. Opinion mining is a study field that tries to identify and classify subjectivity, such as opinions, emotions or sentiments in natural language. In this research, 759.176 Portuguese reviews were extracted from the app store Google Play. Due to the large amount of reviews, large-scale processing techniques were needed, involving powerful frameworks such as Hadoop and Mahout. Based on tests conducted it was concluded that pre-processing has an insignificant role in opinion mining task for the specific domain of reviews of mobile apps. The work also contributed to the creation of a corpus consisting of 759 thousand reviews and a dictionary of slangs and abbreviations commonly used in the Internet.
  • Keywords
    Internet; data mining; mobile computing; natural language processing; text analysis; Google Play app store; Hadoop; Internet abbreviations; Internet slangs; Internet typo errors; Mahout; Portuguese reviews; mobile app reviews; opinion mining; social media language dataset; text preprocessing; Data mining; Dictionaries; Internet; Logistics; Matrix converters; Sentiment analysis; Support vector machines; large-scale data processing; opinion mining; sentiment analysis; text mininig; text pre-processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (BRACIS), 2014 Brazilian Conference on
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/BRACIS.2014.20
  • Filename
    6984806