• DocumentCode
    524657
  • Title

    Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene

  • Author

    Xu, Xianghua ; Pan, Shengyi ; Wan, Jian

  • Author_Institution
    Grid & Service Comput. Lab., Hangzhou Dianzi Univ., Hangzhou, China
  • Volume
    1
  • fYear
    2010
  • fDate
    28-31 May 2010
  • Firstpage
    382
  • Lastpage
    386
  • Abstract
    Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.
  • Keywords
    Compression algorithms; Computer science; Data mining; Frequency; Grid computing; Information retrieval; Java; Search engines; Space technology; Vocabulary; index compression; inverted index; lucene; performance evaluation; search engine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Optimization (CSO), 2010 Third International Joint Conference on
  • Conference_Location
    Huangshan, Anhui, China
  • Print_ISBN
    978-1-4244-6812-6
  • Electronic_ISBN
    978-1-4244-6813-3
  • Type

    conf

  • DOI
    10.1109/CSO.2010.126
  • Filename
    5533050