Title :
Performance Evaluation of Compressed Inverted Index in Lucene
Author :
Wan, Jian ; Pan, Shengyi
Author_Institution :
Grid & Service Comput., Lab. Hangzhou Dianzi Univ., Hangzhou, China
Abstract :
Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we compare three typical index compression schemes in Lucene-the open source information retrieval system. First, index compression schemes are realized in Lucene. Then we present the comparison results of these compression schemes in compression ratio, decompression speed, and scalability. In different algorithms, the impact caused by whether index file is interleaving has remarkable discrepancies in compression ratio and decompression speed, and the scale of data also influences the algorithm´s efficiency.
Keywords :
indexing; information retrieval systems; public domain software; search engines; Lucene system; index compression; information retrieval system; inverted index; open source system; search engine; Computer science; Frequency; Grid computing; Information retrieval; Interleaved codes; Java; Performance analysis; Scalability; Search engines; Vocabulary; index compression; inverted index; lucene; performance evaluation; search engine;
Conference_Titel :
Research Challenges in Computer Science, 2009. ICRCCS '09. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3927-0
Electronic_ISBN :
978-1-4244-5410-5
DOI :
10.1109/ICRCCS.2009.53