DocumentCode
2728263
Title
An unsupervised hierarchical approach to document categorization
Author
Wetzker, Robert ; Alpcan, Tansu ; Bauckhage, Christian ; Umbrath, Winfried ; Albayrak, Sahin
fYear
2007
fDate
2-5 Nov. 2007
Firstpage
482
Lastpage
486
Abstract
We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents.
Keywords
Benchmark testing; Context awareness; Humans; Internet; Laboratories; Search engines; Tagging; Taxonomy; Text categorization; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, IEEE/WIC/ACM International Conference on
Conference_Location
Fremont, CA
Print_ISBN
978-0-7695-3026-0
Type
conf
DOI
10.1109/WI.2007.144
Filename
4427140
Link To Document