• DocumentCode
    3050189
  • Title

    Higher compression from the Burrows-Wheeler transform by modified sorting

  • Author

    Chapin, Brenton ; Tate, Stephen R.

  • Author_Institution
    Dept. of Comput. Sci., North Texas Univ., Denton, TX, USA
  • fYear
    1998
  • fDate
    30 Mar-1 Apr 1998
  • Firstpage
    532
  • Abstract
    Summary form only given. The Burrows-Wheeler transform (BWT) compression technique is based on sorting substrings of the input, and has a performance rivalling the best previously known techniques. We show that the ordering used in the sorting stage of the BWT, an aspect hitherto ignored, can have a significant impact on the size of the compressed data. We modify the sorting order in two separate ways. First, we try reordering the symbol alphabet, and doing a standard sort based on the permuted character set. This is particularly interesting because the BWT´s sensitivity to alphabet ordering is fairly unique among general-purpose compression schemes. Previous techniques, including statistical techniques (such as the PPM algorithms) and dictionary techniques (represented by LZ77, LZ78, and their descendants), are largely based on pattern matching which is entirely independent of the encoding used for the source alphabet. On files in which the alphabet is arbitrarily ordered, such as ASCII text and certain domain-specific encoding; such as the geo file from the Calgary Compression Corpus, this technique improved the compression ratio of the BWT-based compression algorithm. On the other hand, data which already had a significant alphabet ordering, such as image data, showed little improvement with this technique. The second modified sorting technique was to modify the sorting algorithm itself to order strings in a manner analogous to reflected Gray codes. In particular, we alternated increasing and decreasing order on the second character position, changing whenever the character in the first position changed
  • Keywords
    data compression; image coding; transform coding; transforms; ASCII text; BWT-based compression algorithm; Burrows-Wheeler transform; Calgary Compression Corpus; compression ratio; dictionary techniques; experimental results; general-purpose compression; geo file; image data; modified sorting; pattern matching; performance; permuted character set; reflected Gray codes; sorting algorithm; sorting order modification; source alphabet encoding; statistical techniques; substrings sorting; symbol alphabet reordering; Compression algorithms; Computer science; Dictionaries; Encoding; Image coding; Pattern matching; Reflective binary codes; Sorting; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1998. DCC '98. Proceedings
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-8186-8406-2
  • Type

    conf

  • DOI
    10.1109/DCC.1998.672253
  • Filename
    672253