DocumentCode :
3531493
Title :
Flatten hierarchies for large-scale hierarchical text categorization
Author :
Wang, Xiao-Lin ; Lu, Bao-Liang
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2010
fDate :
5-8 July 2010
Firstpage :
139
Lastpage :
144
Abstract :
Hierarchies are very popular in organizing documents and web pages, hence automated hierarchical classification techniques are desired. However, the current dominant hierarchical approach of top-down method suffers accuracy decrease compared with flat classification approaches, because of error propagation and bottom nodes´ data sparsity. In this paper we flatten hierarchies to relieve such accuracy decrease in top-down method, which aims to make hierarchies both effective enough to make large-scale classification tasks feasible, and simple enough to ensure high classification accuracy. We propose two flattening strategies based on these two causes of the accuracy decrease, and experimental results show that the flattening strategy designed for error propagation is more effective, which suggests that hierarchies with lots of branches at top layers can provide high classification accuracy. Besides, we analyze the computational complexity before and after flattening, which approximately agree with the experimental results.
Keywords :
pattern classification; text analysis; Web page organization; automated hierarchical classification technique; bottom node data sparsity; classification accuracy; document organization; error propagation; flattening strategy; large-scale hierarchical text categorization; top-down method; Accuracy; Computational complexity; Indexing; Support vector machines; Training; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information Management (ICDIM), 2010 Fifth International Conference on
Conference_Location :
Thunder Bay, ON
Print_ISBN :
978-1-4244-7572-8
Type :
conf
DOI :
10.1109/ICDIM.2010.5664247
Filename :
5664247
Link To Document :
بازگشت