• DocumentCode
    2282993
  • Title

    Tailoring Taxonomies for Efficient Text Categorization and Expert Finding

  • Author

    Wetzker, R. ; Umbrath, W. ; Hennig, L. ; Bauckhage, C. ; Alpcan, T. ; Metze, F.

  • Author_Institution
    DAI-Labor, Tech. Univ. Berlin, Berlin
  • Volume
    3
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    459
  • Lastpage
    462
  • Abstract
    Automatic content categorization by means of taxonomies is a powerful tool for information retrieval and search technologies as it improves the accessibility of data both for humans and machines. While research on automatic categorization has mainly focused on the problem of classifier design, hardly any effort has been spent on the optimization of the taxonomy size itself. However, taxonomy tailoring may significantly improve computational efficiency and scalability of modern retrieval systems where taxonomies often consist of tens of thousands of non-uniformly distributed categories. In this paper we demonstrate empirically that small subtrees of a taxonomy already enable reliable categorization. We compare several measures for the optimal selection of sub-taxonomies and investigate to what extent a reduction affects the classification quality. We consider applications in classical document categorization and in the upcoming area of expert finding and report corresponding results obtained from experiments with standard benchmark data.
  • Keywords
    content management; information retrieval; text analysis; automatic content categorization; classical document categorization; expert finding; information retrieval; search technology; text categorization; Computational efficiency; Content based retrieval; Design optimization; Humans; Information retrieval; Intelligent agent; Laboratories; Taxonomy; Text categorization; Usability; optimization; tailoring taxonomies; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-0-7695-3496-1
  • Type

    conf

  • DOI
    10.1109/WIIAT.2008.179
  • Filename
    4740821