Title :
An Unsupervised Data-Driven Cross-Lingual Method for Building High Precision Sentiment Lexicons
Author :
Sangiorgi, Pierluca ; Augello, Agnese ; Pilato, Giovanni
Author_Institution :
ICAR (Ist. di Calcolo e Reti ad Alte Prestazioni), Palermo, Italy
Abstract :
In this paper we present a completely unsupervised approach for creating a sentiment lexicon. The approach has been realized by designing a pipeline which implements an unsupervised system that covers different aspects: the automatic extraction of user reviews, the pre-processing of text, the use of a scoring measure which combines: entropy, term frequency, inverse document frequency, and finally a cross lingual intersection. We have validated the approach though the analysis of a previews present in the Google Play market. The results show the effectiveness of the approach given by satisfactory values of precision for the obtained lexicon.
Keywords :
computational linguistics; entropy; information retrieval; text analysis; unsupervised learning; Google Play market; cross lingual intersection; entropy; high precision sentiment lexicons; inverse document frequency; scoring measure; term frequency; text preprocessing; unsupervised data-driven cross-lingual method; unsupervised system; user reviews automatic extraction; Buildings; Dictionaries; Entropy; Frequency measurement; Google; Pipelines; Pragmatics; Machine Learning; Sentiment Analysis; Sentiment Lexicon;
Conference_Titel :
Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on
Conference_Location :
Irvine, CA
DOI :
10.1109/ICSC.2013.40