Title :
An improvement of flat approach on hierarchical text classification using top-level pruning classifiers
Author :
Phachongkitphiphat, Natchanon ; Vateekul, Peerapon
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
Abstract :
Hierarchical classification has been becoming a popular research topic nowadays, particularly on the web as text categorization. For a large web corpus, there can be a hierarchy with hundreds of thousands of topics, so it is common to handle this task using a flat classification approach, inducing a binary classifier only for the leaf-node classes. However, it always suffers from such low prediction accuracy due to an imbalanced issue in the training data. In this paper, we propose two novel strategies: (i) “Top-Level Pruning” to narrow down the candidate classes, and (ii) “Exclusive Top-Level Training Policy” to build more effective classifiers by utilizing the top-level data. The experiments on the Wikipedia dataset show that our system outperforms the traditional flat approach unanimously on all hierarchical classification metrics.
Keywords :
Internet; pattern classification; text analysis; Wikipedia dataset; binary classifier; exclusive top-level training policy; flat classification approach; hierarchical text classification; leaf-node classes; text categorization; top-level pruning classifiers; flat approach; hierarchical classification; hierarchy pruning; text classification;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2014 11th International Joint Conference on
Conference_Location :
Chon Buri
Print_ISBN :
978-1-4799-5821-4
DOI :
10.1109/JCSSE.2014.6841847