DocumentCode :
510288
Title :
Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information
Author :
Gao, Feng ; Fu, WeiMing ; Zhong, YiPing ; Zhao, Danfeng
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
Volume :
1
fYear :
2009
fDate :
11-14 Dec. 2009
Firstpage :
54
Lastpage :
58
Abstract :
Although an improvement of hierarchical text classification can be achieved by using hierarchical structure information, existing hierarchical text classification methods suffer from two problems: data skew (especially in large-scale hierarchy) and error propagation. In this paper, we first define the concept of path-based semantic vector for the presentation of categories. Then a set of additional reliable training data for data-sparse categories can be retrieved based on such representation and particular similarity metrics. This training data enhancement strategy is classifier independent and can improve the classification of categories without adequate training data. Second, we propose the occurrence probability based strategy for hierarchical text classification which can reduce error propagation efficiently. Cooccurrence probability is then introduced to correct the errors occurred on higher levels of the hierarchy. Extensive experiments show that our hierarchical classification strategies perform well on ODP dataset, even in the condition of having few training data.
Keywords :
pattern classification; probability; text analysis; ODP dataset; automatic text categorization; cooccurrence probability; data skew; data sparse categories; error propagation reduction; hierarchical structure information; large scale hierarchical text classification; path based semantic vector; training data enhancement strategy; Bayesian methods; Computational intelligence; Computer errors; Electronic mail; Information retrieval; Large-scale systems; Support vector machine classification; Support vector machines; Text categorization; Training data; data skew; error propagation; hierarchical classification; path semantic representation; prior information;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Security, 2009. CIS '09. International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-5411-2
Type :
conf
DOI :
10.1109/CIS.2009.38
Filename :
5376730
Link To Document :
بازگشت