Title :
A novel approach to sequence-of-documents focused text categorization using the concept of a degree of fuzzy set subsethood
Author :
Sławomir Zadrożny;Janusz Kacprzyk;Marek Gajewski
Author_Institution :
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447 Warszawa, Poland
Abstract :
This work is meant as a step towards developing an effective and efficient procedure for a special type of the text categorization problem. A set of documents and a set of their categories are assumed. However, in addition to being assigned to a specific category, each document belongs to a certain sequence of documents, referred to as a case, comprising of documents from the same class. The problem considered is how to classify a document to a proper sequence of documents, or case, within a specified category. If each case is treated as a separate category, then the potential training datasets are rather small. We propose an algorithm which is based on a combination of two indicators characterizing a document to be classified: one reflecting its similarity to a case and one reflecting its similarity to a category. These indicators are based on the measure of subsethood of fuzzy sets. We study the effectiveness of the proposed algorithm for various combinations of weights of both indicators and subsethood measures employed.
Keywords :
"Text categorization","Hidden Markov models","Training","Weight measurement","Pragmatics","Fuzzy sets","Standards"
Conference_Titel :
Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC), 2015 Annual Conference of the North American
DOI :
10.1109/NAFIPS-WConSC.2015.7284173