DocumentCode
2248133
Title
A comparative study on two large-scale hierarchical text classification tasks´ solutions
Author
Zhang, Jian ; Zhao, Hai ; Lu, Bao-Liang
Author_Institution
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Volume
6
fYear
2010
fDate
11-14 July 2010
Firstpage
3275
Lastpage
3280
Abstract
Patent classification is a large scale hierarchical text classification (LSHTC) task. Though comprehensive comparisons, either learning algorithms or feature selection strategies, have been fully made in the text categorization field, few work was done for a LSHTC task due to high computational cost and complicated structural label characteristics. For the first time, this paper compares two popular learning frameworks, namely hierarchical support vector machine (SVM) and k nearest neighbor (k-NN) that are applied to a LSHTC task. Experiment results show that the latter outperforms the former in this LSHTC task, which is quite different from the usual results for normal text categorization tasks. Then this paper does a comparative study on different similarity measures and ranking approaches in k-NN framework for LSHTC task. Conclusions can be drawn that k-NN is more appropriate for the LSHTC task than hierarchical SVM and for a specific LSHTC task. BM25 outperforms other similarity measures and List Weak gains a better performance than other ranking approaches. We also find an interesting phenomenon that using all the labels of the retrieved neighbors can remarkably improve classification performance over only using the first label of the retrieved neighbors.
Keywords
learning (artificial intelligence); patents; pattern classification; support vector machines; text analysis; BM25; ListWeak; feature selection strategies; hierarchical support vector machine; k nearest neighbor; large scale hierarchical text classification tasks; learning algorithms; patent classification; text categorization; Classification algorithms; Nearest neighbor searches; Patents; Support vector machines; Taxonomy; Text categorization; Training; Hierarchical SVM; Hierarchical text classification; Large-scale text classification; Ranking approach; Similarity measure; Text classification; comparative study; k-NN;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location
Qingdao
Print_ISBN
978-1-4244-6526-2
Type
conf
DOI
10.1109/ICMLC.2010.5580696
Filename
5580696
Link To Document