DocumentCode :
2489810
Title :
A Mixed Coding Scheme for Inverted File Index Compression
Author :
Chen, Jinlin ; Zhong, Ping ; Cook, Terry
Author_Institution :
Dept. of Comput. Sci., City Univ. of New York, NY
fYear :
2006
fDate :
13-14 Nov. 2006
Firstpage :
1
Lastpage :
8
Abstract :
Cluster property of d-gaps in inverted lists provides valuable information for improving index compression and subsequently improving the performance of Web search engines. By clustering d-gaps of an inverted list strictly based on a threshold, and then encoding clustered and non-clustered d-gaps using different methods, we can tailor to the specific properties of different d-gaps and achieve better compression ratio. Based on this idea, in this paper we propose a cluster based mixed coding scheme for inverted file index compression: mixed gamma/flat binary code. Experiment results show that the new coding scheme achieves at least equal performance in terms of compression ratio comparing to interpolative code which is considered as one of the most efficient bitwise codes at present. By adjusting the parameters for the new code, even better results can be achieved. Besides, the new code has lower complexity comparing to interpolative code and therefore enables faster encoding and decoding
Keywords :
Internet; binary codes; data compression; document handling; indexing; search engines; Web search engines; bitwise codes; cluster property; decoding; encoding; interpolative code; inverted file index compression; mixed gamma-flat binary code; Binary codes; Costs; Decoding; Encoding; Indexing; Internet; Probability distribution; Search engines; Web search; Web sites; Index compression; Inverted file; d-gap;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hot Topics in Web Systems and Technologies, 2006. HOTWEB '06. 1st IEEE Workshop on
Conference_Location :
Boston, MA
Print_ISBN :
1-4244-0596-3
Electronic_ISBN :
1-4244-0596-3
Type :
conf
DOI :
10.1109/HOTWEB.2006.355272
Filename :
4178389
Link To Document :
بازگشت