DocumentCode :
3375162
Title :
Order preserving string compression
Author :
Antoshenkov, Gennady ; Lomet, David ; Murray, James
Author_Institution :
Digital Equipment Corp., Maynard, MA, USA
fYear :
1996
fDate :
26 Feb-1 Mar 1996
Firstpage :
655
Lastpage :
663
Abstract :
Order-preserving compression can improve sorting and searching performance, and hence the performance of database systems. We describe a new parsing (tokenization) technique that can be applied to variable-length “keys”, producing substantial compression. It can both compress and decompress data, permitting variable lengths for dictionary entries and compressed forms. The key notion is to partition the space of strings into ranges, encoding the common prefix of each range. We illustrate our method with padding character compression for multi-field keys, demonstrating the dramatic gains possible. A specific version of the method has been implemented in Digital´s Rdb relational database system to enable effective multi-field compression
Keywords :
data compression; encoding; relational databases; sorting; Digital Rdb relational database system; compressed forms; data decompression; database systems performance; multi-field compression; multi-field keys; order-preserving string compression; padding character compression; parsing technique; range common prefix encoding; searching performance; sorting performance; string-space partitioning; tokenization technique; variable-length dictionary entries; variable-length keys; Arithmetic; Binary trees; Data compression; Database systems; Dictionaries; Encoding; Frequency; Probability; Relational databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 1996. Proceedings of the Twelfth International Conference on
Conference_Location :
New Orleans, LA
ISSN :
1063-6382
Print_ISBN :
0-8186-7240-4
Type :
conf
DOI :
10.1109/ICDE.1996.492216
Filename :
492216
Link To Document :
بازگشت