• DocumentCode
    3127068
  • Title

    Multilingual Sentiment Analysis Using Latent Semantic Indexing and Machine Learning

  • Author

    Bader, Brett W. ; Kegelmeyer, W. Philip ; Chew, Peter A.

  • Author_Institution
    Sandia Nat. Labs., Albuquerque, NM, USA
  • fYear
    2011
  • fDate
    11-11 Dec. 2011
  • Firstpage
    45
  • Lastpage
    52
  • Abstract
    We present a novel approach to predicting the sentiment of documents in multiple languages, without translation. The only prerequisite is a multilingual parallel corpus wherein a training sample of the documents, in a single language only, have been tagged with their overall sentiment. Latent Semantic Indexing (LSI) converts that multilingual corpus into a multilingual ``concept space´´. Both training and test documents can be projected into that space, allowing cross-lingual semantic comparisons between the documents without the need for translation. Accordingly, the training documents with known sentiment are used to build a machine learning model which can, because of the multilingual nature of the document projections, be used to predict sentiment in the other languages. We explain and evaluate the accuracy of this approach. We also design and conduct experiments to investigate the extent to which topic and sentiment separately contribute to that classification accuracy, and thereby shed some initial light on the question of whether topic and sentiment can be sensibly teased apart.
  • Keywords
    document handling; indexing; learning (artificial intelligence); natural language processing; document translation; latent semantic indexing; machine learning model; multilingual concept space; multilingual parallel corpus; multilingual sentiment analysis; Accuracy; Large scale integration; Machine learning; Predictive models; Semantics; Training; Vectors; Sentiment analysis; latent semantic analysis; machine learning; multilingual; parallel corpora;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    978-1-4673-0005-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2011.185
  • Filename
    6137359