Title :
Data mining method from text database based on fuzzy quantification analysis
Author :
Aoki, Keisuke ; Watada, Junso
Author_Institution :
Graduate Sch. of Inf., Production & Syst., Waseda Univ., Fukuoka, Japan
Abstract :
Recently, various types of data are expected to get in information processing according to multi-media technology. Especially, linguistic data are employed in fuzzy systems as well as fuzzy numerical values. In this paper we propose a text mining method based on fuzzy quantification model. In the process of text mining, we pursue the following steps: 1) Sentences included in a text in Japanese are broken down into words. 2) It is possible to realize common understanding using fuzzy thesaurus that enables us to translate words into synonyms or into upper concepts. In this paper, we employ the method to translate words using Chinese characters or continuous letters of Katakana more than one katakana letter (Japanese alphabet letter) into keywords. The method realizes the high speed of processing without any dictionary for separating words. Fuzzy multivariate analysis is employed to analyze such processed data and to abstract a latent mutual related structure under the data. In other words, we abstract the knowledge from the given text data. At the end we apply the method to mining the text information of libraries and Web pages distributed over a Web network and discussing about the application to Kansei engineering.
Keywords :
data mining; fuzzy systems; language translation; natural languages; text analysis; Kansei engineering; data mining method; fuzzy multivariate analysis; fuzzy quantification analysis; fuzzy system; information processing; linguistic data; multimedia technology; text database; text mining process; Data analysis; Data mining; Dictionaries; Fuzzy systems; Information processing; Libraries; Multimedia databases; Text mining; Thesauri; Web pages;
Conference_Titel :
Systems, Man and Cybernetics, 2004 IEEE International Conference on
Print_ISBN :
0-7803-8566-7
DOI :
10.1109/ICSMC.2004.1401419