Title :
Sentiment Polarity Classification Using Statistical Data Compression Models
Author :
Ziegelmayer, D. ; Schrader, R.
Author_Institution :
Inst. fur Inf., Univ. zu Koln, Koln, Germany
Abstract :
With growing availability and popularity of user generated content, the discipline of sentiment analysis has come to the attention of many researchers. Existing work has mainly focused on either knowledge based methods or standard machine learning techniques. In this paper we investigate sentiment polarity classification based on adaptive statistical data compression models. We evaluate the classification performance of the loss less compression algorithm Prediction by Partial Matching (PPM) as well as compression based measures using PPM-like character n-gram frequency statistics. Comprehensive experiments on three corpora show that compression based methods are efficient, easy to apply and can compete with the accuracy of sophisticated classifiers such as support vector machines.
Keywords :
computational linguistics; data compression; data mining; knowledge based systems; learning (artificial intelligence); pattern classification; pattern matching; statistical analysis; PPM; adaptive statistical data compression model; character n-gram frequency statistics; classification performance evaluation; compression based measure; corpora; knowledge based method; lossless compression algorithm; machine learning; prediction by partial matching; sentiment analysis; sentiment polarity classification; user generated content; Accuracy; Compression algorithms; Computational modeling; Entropy; Frequency measurement; Support vector machines; Training; Prediction by Partial Matching; data compression; opinion mining; sentiment analysis; text classification;
Conference_Titel :
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-5164-5
DOI :
10.1109/ICDMW.2012.43