DocumentCode :
2757039
Title :
Supervised term weighting for sentiment analysis
Author :
Nguyen, Tam T. ; Chang, Kuiyu ; Hui, Siu Cheung
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2011
fDate :
10-12 July 2011
Firstpage :
89
Lastpage :
94
Abstract :
Vector space text classification is commonly used in intelligence applications such as email and conversation analysis. In this paper we propose a supervised term weighting scheme called tf × KL (term frequency Kullback-Leibler), which weights each word proportionally to the ratio of its document frequency across the positive and negative class. We then generalize tf × KL to effectively deal with class imbalance, which is very common in real world intelligence analysis. The generalized tf × KL weights each word according to the ratio of the positive and negative class conditioned word probabilities instead of the raw document frequencies. Results on four classification datasets show tf × KL to perform consistently better than the baseline tf ×idf and 4 other supervised term weighting schemes, including the recently proposed tf × rf (term frequency relevance frequency). The generalized tf × KL was found to be extremely robust in dealing with highly skewed class distributions, beating the second runner-up by more than 20% on a dataset that has only 10% positive training examples. The generalized tf × KL is thus an effective and robust term weighting scheme that can significantly improve binary classification performance in sentiment analysis and intelligence applications.
Keywords :
pattern classification; text analysis; Kullback-Leibler; conversation analysis; document frequencies; document frequency; email analysis; intelligence analysis; negative class; positive class; sentiment analysis; supervised term weighting; vector space text classification; Benchmark testing; Communities; Educational institutions; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics (ISI), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-0082-8
Type :
conf
DOI :
10.1109/ISI.2011.5984056
Filename :
5984056
Link To Document :
بازگشت