DocumentCode :
140826
Title :
A tunable compression framework for bitmap indices
Author :
Guzun, Gheorghi ; Canahuate, Guadalupe ; Chiu, Dereck ; Sawin, Jason
Author_Institution :
Electr. & Comput. Eng., Univ. of Iowa, Iowa City, IA, USA
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
484
Lastpage :
495
Abstract :
Bitmap indices are widely used for large read-only repositories in data warehouses and scientific databases. Their binary representation allows for the use of bitwise operations and specialized run-length compression techniques. Due to a trade-off between compression and query efficiency, bitmap compression schemes are aligned using a fixed encoding length size (typically the word length) to avoid explicit decompression during query time. In general, smaller encoding lengths provide better compression, but require more decoding during query execution. However, when the difference in size is considerable, it is possible for smaller encodings to also provide better execution time. We posit that a tailored encoding length for each bit vector will provide better performance than a one-size-fits-all approach. We present a framework that optimizes compression and query efficiency by allowing bitmaps to be compressed using variable encoding lengths while still maintaining alignment to avoid explicit decompression. Efficient algorithms are introduced to process queries over bitmaps compressed using different encoding lengths. An input parameter controls the aggressiveness of the compression providing the user with the ability to tune the tradeoff between space and query time. Our empirical study shows this approach achieves significant improvements in terms of both query time and compression ratio for synthetic and real data sets. Compared to 32-bit WAH, VAL-WAH produces up to 1.8× smaller bitmaps and achieves query times that are 30% faster.
Keywords :
data compression; data structures; query processing; VAL-WAH data set; binary representation; bit vector; bitmap compression schemes; bitmap indices; bitwise operations; compression ratio; data warehouses; fixed encoding length size; input parameter; one-size-fits-all approach; query efficiency; query time; read-only repositories; scientific databases; specialized run-length compression techniques; tunable compression framework; variable encoding lengths; word length; Algorithm design and analysis; Computer architecture; Decoding; Encoding; Indexes; Query processing; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816675
Filename :
6816675
Link To Document :
بازگشت