Title :
Combining Conditional Random Fields and first-order logic for modeling hidden content structure in sentiment analysis
Author :
Lei Wu ; Yinghua Zhang ; Wensheng Zhang ; Jue Wang
Author_Institution :
Inst. of Autom., Beijing, China
Abstract :
The paper develops a connection between the first-order logic representation and the content structure model in sentiment analysis applications. We propose a modified semi-supervised approach to study the word-level content structure with well-designed first-order logic features. The word-level content structure is the Conditional Random Fields (CRF) with latent word-level topic nodes. Introducing first-order logic features into our model can solve the long-distance dependency problem. The new approach is applied to two multi-aspect sentiment analysis tasks: the multi-aspect sentence labeling task and the multi-aspect rating prediction task. We use the data from Amazon corpus and movie-review corpus. We compare our method with other three hidden nodes graphical models, i.e. the Latent Dirichlet Allocation (LDA), the Hidden-Unit CRF (HUCRF), and the Content Structure using CRF (CSCRF, which is considered as our sentence-level baseline). Experimental results demonstrate that our method outperforms the sentence-level baseline by 2.1% of the F1 measure in the multi-aspect sentence labeling task, and by 2.1% of the Accuracy in the rating prediction task. Our method outperforms other two methods at most by 16.6% and 10.3% separately in the multi-aspect sentence labeling task and the rating prediction task. By using 3000 unlabeled documents, our method improves the F1-measure in the multi-aspect sentence labeling task by 8.2%, and improves the Accuracy in the rating prediction task by 3.0%, using 400 unlabeled reviews.
Keywords :
formal logic; random processes; text analysis; Amazon corpus; CSCRF; F1 measure; HUCRF; LDA; conditional random fields; first-order logic features; first-order logic representation; hidden content structure modeling; hidden nodes graphical models; hidden-unit CRF; latent Dirichlet allocation; latent word-level topic nodes; long-distance dependency problem; movie-review corpus; multiaspect rating prediction task; multiaspect sentence labeling task; multiaspect sentiment analysis tasks; semisupervised approach; sentence-level baseline; unlabeled documents; word-level content structure; Accuracy; Computational modeling; Graphical models; Hidden Markov models; Labeling; Markov random fields; Training; First-order logic; conditional random fields; probabilitic graphical model; topic model;
Conference_Titel :
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location :
Shenyang
DOI :
10.1109/ICNC.2013.6818138