• DocumentCode
    2728263
  • Title

    An unsupervised hierarchical approach to document categorization

  • Author

    Wetzker, Robert ; Alpcan, Tansu ; Bauckhage, Christian ; Umbrath, Winfried ; Albayrak, Sahin

  • fYear
    2007
  • fDate
    2-5 Nov. 2007
  • Firstpage
    482
  • Lastpage
    486
  • Abstract
    We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents.
  • Keywords
    Benchmark testing; Context awareness; Humans; Internet; Laboratories; Search engines; Tagging; Taxonomy; Text categorization; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, IEEE/WIC/ACM International Conference on
  • Conference_Location
    Fremont, CA
  • Print_ISBN
    978-0-7695-3026-0
  • Type

    conf

  • DOI
    10.1109/WI.2007.144
  • Filename
    4427140