Title :
Text categorization with the concept of fuzzy set of informative keywords
Author_Institution :
Samsung SDS, South Korea
Abstract :
Text categorization is the procedure of assigning a category to a particular document among predefined categories. Informative keywords are the ones which reflect the contents of a document. A document includes informative keywords and non-informative keywords. Mainly non-informative keywords play the roles of grammatical functions in sentences; such keywords, what are called functional keywords, reflect its contents very little, so they should be removed in the process of document indexing. The discrimination between informative keywords and functional keywords is not crisp. In the process of document indexing, a document is represented as a set of informative keywords. In this paper, it is proposed that a document be represented into a fuzzy set of informative keywords, instead of a crisp set of informative keywords. The experiments of the categorization of news articles show that the proposed schemes of text categorization outperform the schemes with crisp sets.
Keywords :
category theory; data mining; fuzzy set theory; indexing; document indexing; functional keywords; fuzzy set theory; informative keywords; text categorization; Data mining; Fuzzy sets; Hardware; Indexing; Information analysis; Internet; Network synthesis; Pattern analysis; Text categorization; Text mining;
Conference_Titel :
Fuzzy Systems Conference Proceedings, 1999. FUZZ-IEEE '99. 1999 IEEE International
Conference_Location :
Seoul, South Korea
Print_ISBN :
0-7803-5406-0
DOI :
10.1109/FUZZY.1999.793010