DocumentCode :
3363219
Title :
Fast text classification: a training-corpus pruning based approach
Author :
Zhou, Shuigeng ; Ling, Tok Wang ; Guan, Jihong ; Hu, Jiangtao ; Zhou, Aoying
Author_Institution :
Dept. of Comput. Sci. & Eng., Fudan Univ., Shanghai, China
fYear :
2003
fDate :
26-28 March 2003
Firstpage :
127
Lastpage :
136
Abstract :
With the rapid growth of on-line information available, text classification is becoming more and more important. kNN is a widely used text classification method of high performance. However, this method is inefficient because it requires a large amount of computation for evaluating the similarity between a test document and each training document. In this paper, we propose a fast kNN text classification approach based on pruning the training corpus. By using this approach, the size of training corpus can be condensed sharply so that time-consuming on kNN searching can be cut off significantly, and consequently classification efficiency can be improved substantially while classification performance is preserved comparable to that of without pruning. Effective, algorithm for text corpus pruning is designed. Experiments over the Reuters corpus are carried out, which validate the practicability of the proposed approach. Our approach is especially suitable for on-line text classification applications.
Keywords :
learning (artificial intelligence); text analysis; Reuters corpus; fast kNN text classification approach; kNN searching; on-line information; on-line text classification applications; test document; text classification; training document; training-corpus pruning based approach; Algorithm design and analysis; Computer science; Content based retrieval; Drives; High performance computing; Information retrieval; Machine learning; Supervised learning; Testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on
Conference_Location :
Kyoto, Japan
Print_ISBN :
0-7695-1895-8
Type :
conf
DOI :
10.1109/DASFAA.2003.1192376
Filename :
1192376
Link To Document :
بازگشت