Title :
Mining textual significant expressions reflecting opinions in natural languages
Author :
Jan Žižka;František Dařena
Author_Institution :
Department of Informatics / SoNet Research Center, Mendel University in Brno, Brno, Czech Republic
Abstract :
Revealing an opinion hidden in a text document is a challenging task. The article presents a method based on the automatic extraction of expressions that are significant for specifying a document attitude to a given topic. The significant expressions are composed using revealed significant words in the documents. The significant words are selected by the c5 decision-tree generator based on the entropy minimization. Words included in branches represent kernels of the significant expressions. The full expressions are composed of the significant words and words surrounding them in the original documents. Such expressions provide much more information than individual (key-)words and can be used for analysing a document meaning and the cause of the opinion: what exactly the opinion deals with? The results are demonstrated using large real-world multilingual data representing customers´ opinions written in a free form.
Keywords :
"Entropy","Natural languages","Intelligent systems","Internet","Accuracy","Decision trees","Kernel"
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
Print_ISBN :
978-1-4577-1676-8
Electronic_ISBN :
2164-7151
DOI :
10.1109/ISDA.2011.6121644