• DocumentCode
    3228404
  • Title

    Improving Index Compression Using Cluster Information

  • Author

    Chen, Jinlin ; Zhong, Ping ; Cook, Terry

  • Author_Institution
    Queen Coll., City Univ. of New York, NY
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    188
  • Lastpage
    194
  • Abstract
    The clustering property of document collections in Web search engines provides valuable information for improving index compression. By clustering d-gaps of an inverted list and then encoding clustered and non-clustered d-gaps using different codes, we can tailor to the specific properties of different d-gaps and achieve better compression ratio. Further improvement on index compression can be achieved by adoptively adjusting the cluster threshold for inverted lists. Based on these ideas, in this paper we propose adaptive cluster based mixed codes for inverted file index compression. Experiment results show that codes using adaptive cluster based mixed approach have better performance in terms of compression ratio and lower complexity comparing to interpolative code which is considered as one of the most efficient bitwise codes at present
  • Keywords
    data compression; indexing; pattern clustering; search engines; Web search engines; adaptive cluster based mixed codes; document collections; inverted file index compression; Binary codes; Computer science; Educational institutions; Encoding; Frequency; Indexing; Query processing; Search engines; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.96
  • Filename
    4061365