DocumentCode
524657
Title
Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene
Author
Xu, Xianghua ; Pan, Shengyi ; Wan, Jian
Author_Institution
Grid & Service Comput. Lab., Hangzhou Dianzi Univ., Hangzhou, China
Volume
1
fYear
2010
fDate
28-31 May 2010
Firstpage
382
Lastpage
386
Abstract
Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.
Keywords
Compression algorithms; Computer science; Data mining; Frequency; Grid computing; Information retrieval; Java; Search engines; Space technology; Vocabulary; index compression; inverted index; lucene; performance evaluation; search engine;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
Conference_Location
Huangshan, Anhui, China
Print_ISBN
978-1-4244-6812-6
Electronic_ISBN
978-1-4244-6813-3
Type
conf
DOI
10.1109/CSO.2010.126
Filename
5533050
Link To Document