• DocumentCode
    3375162
  • Title

    Order preserving string compression

  • Author

    Antoshenkov, Gennady ; Lomet, David ; Murray, James

  • Author_Institution
    Digital Equipment Corp., Maynard, MA, USA
  • fYear
    1996
  • fDate
    26 Feb-1 Mar 1996
  • Firstpage
    655
  • Lastpage
    663
  • Abstract
    Order-preserving compression can improve sorting and searching performance, and hence the performance of database systems. We describe a new parsing (tokenization) technique that can be applied to variable-length “keys”, producing substantial compression. It can both compress and decompress data, permitting variable lengths for dictionary entries and compressed forms. The key notion is to partition the space of strings into ranges, encoding the common prefix of each range. We illustrate our method with padding character compression for multi-field keys, demonstrating the dramatic gains possible. A specific version of the method has been implemented in Digital´s Rdb relational database system to enable effective multi-field compression
  • Keywords
    data compression; encoding; relational databases; sorting; Digital Rdb relational database system; compressed forms; data decompression; database systems performance; multi-field compression; multi-field keys; order-preserving string compression; padding character compression; parsing technique; range common prefix encoding; searching performance; sorting performance; string-space partitioning; tokenization technique; variable-length dictionary entries; variable-length keys; Arithmetic; Binary trees; Data compression; Database systems; Dictionaries; Encoding; Frequency; Probability; Relational databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 1996. Proceedings of the Twelfth International Conference on
  • Conference_Location
    New Orleans, LA
  • ISSN
    1063-6382
  • Print_ISBN
    0-8186-7240-4
  • Type

    conf

  • DOI
    10.1109/ICDE.1996.492216
  • Filename
    492216