Title :
On Learning Parsimonious Models for Extracting Consumer Opinions
Author :
Bai, Xue ; Padman, Rema ; Airoldi, Edoardo
Author_Institution :
Carnegie Mellon University
Abstract :
Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers´ preferences for economic or marketing research, or for leveraging a strategic advantage. In this paper, we propose a two-stage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several state-of-the-art machine learning methods. Our findings suggest that sentiments are captured by conditional dependence relations among words, rather than by keywords or high-frequency words.
Keywords :
Bayesian methods; Computer science; Data mining; Data privacy; Internet; Laboratories; Machine learning; Motion pictures; Public policy; Vocabulary;
Conference_Titel :
System Sciences, 2005. HICSS '05. Proceedings of the 38th Annual Hawaii International Conference on
Print_ISBN :
0-7695-2268-8
DOI :
10.1109/HICSS.2005.465