DocumentCode :
3248563
Title :
Text document categorization by term association
Author :
Antonie, Maria-Luiza ; Zaïane, Osmar R.
Author_Institution :
Alberta Univ., Edmonton, Alta., Canada
fYear :
2002
fDate :
2002
Firstpage :
19
Lastpage :
26
Abstract :
A good text classifier is a classifier that efficiently categorizes large sets of text documents in a reasonable time frame and with an acceptable accuracy, and that provides classification rules that are human readable for possible fine-tuning. If the training of the classifier is also quick, this could become in some application domains a good asset for the classifier. Many techniques and algorithms for automatic text categorization have been devised. According to published literature, some are more accurate than others, and some provide more interpretable classification models than others. However, none can combine all the beneficial properties enumerated above. In this paper we present a novel approach for automatic text categorization that borrows from market basket analysis techniques using association rule mining in the data-mining field. We focus on two major problems: (1) finding the best term association rules in a textual database by generating and pruning; and (2) using the rules to build a text classifier. Our text categorization method proves to be efficient and effective, and experiments on well-known collections show that the classifier performs well. In addition, training as well as classification are both fast and the generated rules are human readable.
Keywords :
data mining; learning (artificial intelligence); pattern classification; text analysis; association rule mining; automatic text categorization; automatic text classification; machine learning; rule mining; term association; text categorization; text classifier; text documents; Association rules; Data mining; Electronic mail; Humans; Image databases; Indexing; Information retrieval; Machine learning; Text categorization; Transaction databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183881
Filename :
1183881
Link To Document :
بازگشت