Title :
Dynamic taxonomy composition via keyqueries
Author :
Gollub, Tim ; Volske, Michael ; Hagen, Matthias ; Stein, Bernardo
Author_Institution :
Bauhaus-Univ. Weimar, Weimar, Germany
Abstract :
This paper presents an unsupervised framework for dynamic, subject-oriented taxonomy composition in digital libraries, which can naturally integrate existing library classification systems. The taxonomy classes in our approach correspond to so-called keyqueries that are run against the digital library´s full-text retrieval system. Given a document, a keyquery is a set of few keywords for which the document achieves a high relevance score. Keyqueries can hence be viewed as a general and concise description of the returned retrieval results. The keyquery framework addresses important problems of static classification systems: overlarge classes and overly complex taxonomy structures. If, for instance, a leaf class grows to an indigestible size, keyqueries for the contained documents provide a suitable split mechanism. Since queries are well-known to library users from their daily web search experience, they increase the structural complexity in a transparent way. The paper presents also a strategy for taxonomy-based library exploration. Given a user´s information need in the form of library documents, we synthesize a hierarchy of keyqueries that covers this library subset. We manage to solve this difficult set covering problem on-the-fly by combining inverted and reverted indexes along with heuristic search space pruning within a map-reduce application. An empirical evaluation with an ACM collection of scientific papers demonstrates the efficiency and effectiveness of our taxonomy composition framework.
Keywords :
digital libraries; document handling; query processing; Web search experience; complex taxonomy structures; digital library full-text retrieval system; document achieves; dynamic taxonomy composition; heuristic search space pruning; high relevance score; keyquery framework; library classification systems; library documents; map-reduce application; static classification systems; structural complexity; subject-oriented taxonomy composition; taxonomy composition framework; taxonomy-based library exploration; unsupervised framework; Clustering algorithms; Computational modeling; Heuristic algorithms; Indexes; Libraries; Search engines; Taxonomy; big data problem; classification systems; dynamic taxonomy composition; keyquery; reverted index;
Conference_Titel :
Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
Conference_Location :
London
DOI :
10.1109/JCDL.2014.6970148