DocumentCode
705014
Title
Approximating Document Frequency for Self-Index based Top-k Document Retrieval
Author
Suzuki, Tokinori ; Fujii, Atsushi
Author_Institution
Dept. of Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
fYear
2015
fDate
24-27 March 2015
Firstpage
541
Lastpage
546
Abstract
Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
Keywords
approximation theory; document handling; query processing; trees (mathematics); wavelet transforms; FM-index; document collection; document frequency approximation; exhaustive search; index frameworks; index structures; query term; search efficiency; search time; self-index based top-k document retrieval; wavelet tree; Accuracy; Approximation methods; Arrays; Correlation; Indexes; Mathematical model; Resource description framework; FM-index; approximate search; wavelet tree;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications Workshops (WAINA), 2015 IEEE 29th International Conference on
Conference_Location
Gwangiu
Print_ISBN
978-1-4799-1774-7
Type
conf
DOI
10.1109/WAINA.2015.68
Filename
7096233
Link To Document