DocumentCode :
2638010
Title :
Compressing relations and indexes
Author :
Goldstein, Jonathan ; Ramakrishnan, Raghu ; Shaft, Uri
Author_Institution :
Dept. of Comput. Sci., Wisconsin Univ., Madison, WI, USA
fYear :
1998
fDate :
23-27 Feb 1998
Firstpage :
370
Lastpage :
379
Abstract :
We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially effective for records with many low to medium cardinality fields and numeric fields. In addition, this new technique supports very fast decompression. Promising application domains include decision support systems (DSS), since fact tables, which are by far the largest tables in these applications, contain many low and medium cardinality fields and typically no text fields. Further, our decompression rates are faster than typical disk throughputs for sequential scans; in contrast, gzip is slower. This is important in DSS applications, which often scan large ranges of records. An important distinguishing characteristic of our algorithm, in contrast to compression algorithms proposed earlier, is that we can decompress individual tuples (even individual fields), rather than a full page (or an entire relation) at a time. Also, all the information needed for tuple decompression resides on the same page with the tuple. This means that a page can be stored in the buffer pool and used in compressed form, simplifying the job of the buffer manager and improving memory utilization. Our compression algorithm also improves index structures such as B-trees and R-trees significantly by reducing the number of leaf pages and compressing index entries, which greatly increases the fan-out. We can also use lossy compression on the internal nodes of an index
Keywords :
buffer storage; data compression; decision support systems; relational databases; software performance evaluation; tree data structures; B-trees; R-trees; buffer pool; cardinality fields; decision support systems; disk throughput; fact tables; gzip; index compression; index structures; lossy compression; memory utilization; numeric fields; page level compression; records; relation compression; relational database; tree data structures; tuple decompression; Application software; Compression algorithms; Databases; Decision support systems; Information retrieval; Insulation; Memory management; Packaging; Shafts; Throughput;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 1998. Proceedings., 14th International Conference on
Conference_Location :
Orlando, FL
ISSN :
1063-6382
Print_ISBN :
0-8186-8289-2
Type :
conf
DOI :
10.1109/ICDE.1998.655800
Filename :
655800
Link To Document :
بازگشت