• DocumentCode
    2690544
  • Title

    Generic high-throughput methods for multilingual sentiment detection

  • Author

    Gindl, Stefan ; Scharl, Arno ; Weichselbraun, A.

  • Author_Institution
    Dept. of New Media Technol., MODUL Univ. Vienna, Vienna, Austria
  • fYear
    2010
  • fDate
    13-16 April 2010
  • Firstpage
    239
  • Lastpage
    244
  • Abstract
    Digital ecosystems typically involve a large number of participants from different sectors who generate rapidly growing archives of unstructured text. Measuring the frequency of certain terms to determine the popularity of a topic is comparably straightforward. Detecting sentiment expressed in user-generated electronic content is more challenging, especially in the case of digital ecosystems comprising heterogeneous sets of multilingual documents. This paper describes the use of language-specific grammar patterns and multilingual tagged dictionaries to detect sentiment in German and English document repositories. Digital ecosystems may contain millions of frequently updated documents, requiring sentiment detection methods that maximize throughput. The ideal combination of high-throughput techniques and more accurate (but slower) approaches depends on the specific requirements of an application. To accommodate a wide range of possible applications, this paper presents (i) an adaptive method, balancing accuracy and scalability of multilingual textual sources, (ii) a generic approach for generating language- specific grammar patterns and multilingual tagged dictionaries, and (iii) an extensive evaluation verifying the method´s performance based on Amazon product reviews and user evaluations from Sentiment Quiz, a “game with a purpose” that invites users of the Facebook social networking platform to assess the sentiment of individual sentences.
  • Keywords
    dictionaries; grammars; natural language processing; pattern recognition; social networking (online); text analysis; English document repositories; Facebook social networking platform; German document repositories; digital ecosystems; generic high-throughput methods; language-specific grammar patterns; multilingual sentiment detection; multilingual tagged dictionaries; unstructured text; Accuracy; Conferences; Dictionaries; Ecosystems; Feature extraction; Grammar; Media; evaluation; grammar patterns; multilingual sentiment detection; semantic orientation; tagged dictionaries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Ecosystems and Technologies (DEST), 2010 4th IEEE International Conference on
  • Conference_Location
    Dubai
  • ISSN
    2150-4938
  • Print_ISBN
    978-1-4244-5551-5
  • Type

    conf

  • DOI
    10.1109/DEST.2010.5610641
  • Filename
    5610641