Title :
Approximating Document Frequency for Self-Index based Top-k Document Retrieval
Author :
Suzuki, Tokinori ; Fujii, Atsushi
Author_Institution :
Dept. of Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
Abstract :
Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
Keywords :
approximation theory; document handling; query processing; trees (mathematics); wavelet transforms; FM-index; document collection; document frequency approximation; exhaustive search; index frameworks; index structures; query term; search efficiency; search time; self-index based top-k document retrieval; wavelet tree; Accuracy; Approximation methods; Arrays; Correlation; Indexes; Mathematical model; Resource description framework; FM-index; approximate search; wavelet tree;
Conference_Titel :
Advanced Information Networking and Applications Workshops (WAINA), 2015 IEEE 29th International Conference on
Conference_Location :
Gwangiu
Print_ISBN :
978-1-4799-1774-7
DOI :
10.1109/WAINA.2015.68