• DocumentCode
    168255
  • Title

    Dynamic taxonomy composition via keyqueries

  • Author

    Gollub, Tim ; Volske, Michael ; Hagen, Matthias ; Stein, Bernardo

  • Author_Institution
    Bauhaus-Univ. Weimar, Weimar, Germany
  • fYear
    2014
  • fDate
    8-12 Sept. 2014
  • Firstpage
    39
  • Lastpage
    48
  • Abstract
    This paper presents an unsupervised framework for dynamic, subject-oriented taxonomy composition in digital libraries, which can naturally integrate existing library classification systems. The taxonomy classes in our approach correspond to so-called keyqueries that are run against the digital library´s full-text retrieval system. Given a document, a keyquery is a set of few keywords for which the document achieves a high relevance score. Keyqueries can hence be viewed as a general and concise description of the returned retrieval results. The keyquery framework addresses important problems of static classification systems: overlarge classes and overly complex taxonomy structures. If, for instance, a leaf class grows to an indigestible size, keyqueries for the contained documents provide a suitable split mechanism. Since queries are well-known to library users from their daily web search experience, they increase the structural complexity in a transparent way. The paper presents also a strategy for taxonomy-based library exploration. Given a user´s information need in the form of library documents, we synthesize a hierarchy of keyqueries that covers this library subset. We manage to solve this difficult set covering problem on-the-fly by combining inverted and reverted indexes along with heuristic search space pruning within a map-reduce application. An empirical evaluation with an ACM collection of scientific papers demonstrates the efficiency and effectiveness of our taxonomy composition framework.
  • Keywords
    digital libraries; document handling; query processing; Web search experience; complex taxonomy structures; digital library full-text retrieval system; document achieves; dynamic taxonomy composition; heuristic search space pruning; high relevance score; keyquery framework; library classification systems; library documents; map-reduce application; static classification systems; structural complexity; subject-oriented taxonomy composition; taxonomy composition framework; taxonomy-based library exploration; unsupervised framework; Clustering algorithms; Computational modeling; Heuristic algorithms; Indexes; Libraries; Search engines; Taxonomy; big data problem; classification systems; dynamic taxonomy composition; keyquery; reverted index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
  • Conference_Location
    London
  • Type

    conf

  • DOI
    10.1109/JCDL.2014.6970148
  • Filename
    6970148