Title :
Large-Scale Hierarchical Text Classification Based on Path Semantic Information
Author :
Gao, Feng ; Wu, Chengrong ; Guo, Naiwang ; Zhao, Danfeng
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
Abstract :
Although an improvement of hierarchical text classification can be achieved by using hierarchical structure information, existing hierarchical text classification methods suffer from a problem, namely error propagation (especially in large-scale deep hierarchy). In this paper, we define the concept of path-based semantic vector for the presentation of categories based on which prior information provided by training set can be employed in a classifier-independent way to reduce and further eliminate classification errors. In particular, we first propose the occurrence probability based strategy for hierarchical text classification which can help limit errors rate efficiently. Cooccurrence probability is then introduced to correct the classification errors occurred on higher levels of the hierarchy. Extensive experiments show that our hierarchical classification strategies perform well on ODP dataset, even on deep levels of the hierarchy.
Keywords :
classification; text editing; ODP dataset; classification errors; cooccurrence probability; error propagation; large-scale hierarchical text classification; occurrence probability; path semantic information; Bayesian methods; Computer errors; Computer science; Information retrieval; Intelligent structures; Large-scale systems; Support vector machine classification; Support vector machines; TV; Text categorization; error propagation; hierarchical classification; path semantic representation; prior information;
Conference_Titel :
Business Intelligence and Financial Engineering, 2009. BIFE '09. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-0-7695-3705-4
DOI :
10.1109/BIFE.2009.60