DocumentCode :
3726657
Title :
A New Two-Stage Approach to the Multiaspect Text Categorization
Author :
Slawomir Zadrozny;Janusz Kacprzyk;Marek Gajewski
Author_Institution :
Syst. Res., Warsaw, Poland
fYear :
2015
Firstpage :
1484
Lastpage :
1490
Abstract :
We consider a particular type of text categorization problem which we refer to as the multiaspect classification. It is inspired by some practical scenario of business documents management in a company but has a broader application potential. A distinguishing feature of the new problem considered is the existence of two schemes of classification. The first one is based on the traditional, static set of text categories, possibly arranged into a hierarchy. The second one is based on a dynamic structure of sequences of documents, referred to as cases, identified within each category. While the former problem may be addressed using one of the well known techniques of text categorization (classification), the latter seems to require some distinct approaches due to the fact that the set of cases is unknown in advance, as well as due to the assumed limited number of training documents, if a case should be interpreted as a classic category. In the paper, we discuss the problem in a more detail as well as show the applicability of an intuitively appealing two stage approach to solving the problem of such a multiaspect text categorization.
Keywords :
"Text categorization","Companies","Standards","Hidden Markov models","Information processing"
Publisher :
ieee
Conference_Titel :
Computational Intelligence, 2015 IEEE Symposium Series on
Print_ISBN :
978-1-4799-7560-0
Type :
conf
DOI :
10.1109/SSCI.2015.210
Filename :
7376786
Link To Document :
بازگشت