Title :
Grouping of Customer Opinions Written in Natural Language Using Unsupervised Machine Learning
Author :
Darena, F. ; Zizka, J. ; Burda, K.
Author_Institution :
Dept. of Inf., Mendel Univ., Brno, Czech Republic
Abstract :
Among one of the current and most topical tasks in the area of textual documents processing belongs the problem of automatic categorization. Clustering as the most common form of unsupervised learning enables automatic grouping of unlabeled documents into subsets called clusters. In this paper, the authors are concerned with results of clustering of very large electronic real-world data collections containing customers´ reviews written freely, in English as a natural language. The reviews are automatically clustered into two groups that should contain either positive or negative reviews. The paper focuses on the analysis why certain reviews are assigned wrongly to a group containing mostly reviews of a different class. The assignment of a review into a certain cluster is based on its properties, i.e., on the words that appeared in the review. Thus, words appearing in incorrectly categorized reviews were analyzed. It was found that words that are important from the correct classification viewpoint (and thus bearing some sentiment) are often similarly important as the words in a different set than expected, therefore do not take effect as misleading information unlike words that are much more or quite insignificant.
Keywords :
natural language processing; pattern classification; pattern clustering; text analysis; unsupervised learning; English; automatic categorization problem; automatic unlabeled document grouping; classification viewpoint; clusters; customer opinion grouping; customers reviews; electronic real-world data collections; natural language; textual documents processing; unsupervised machine learning; Clustering algorithms; Dictionaries; Entropy; Natural languages; Prediction algorithms; Training; Vectors; cluster mining; customer opinion; incorrect categorization; similarity; textual data;
Conference_Titel :
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2012 14th International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-1-4673-5026-6
DOI :
10.1109/SYNASC.2012.29