DocumentCode :
524657
Title :
Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene
Author :
Xu, Xianghua ; Pan, Shengyi ; Wan, Jian
Author_Institution :
Grid & Service Comput. Lab., Hangzhou Dianzi Univ., Hangzhou, China
Volume :
1
fYear :
2010
fDate :
28-31 May 2010
Firstpage :
382
Lastpage :
386
Abstract :
Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.
Keywords :
Compression algorithms; Computer science; Data mining; Frequency; Grid computing; Information retrieval; Java; Search engines; Space technology; Vocabulary; index compression; inverted index; lucene; performance evaluation; search engine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
Conference_Location :
Huangshan, Anhui, China
Print_ISBN :
978-1-4244-6812-6
Electronic_ISBN :
978-1-4244-6813-3
Type :
conf
DOI :
10.1109/CSO.2010.126
Filename :
5533050
Link To Document :
بازگشت