• DocumentCode
    3061580
  • Title

    Design consideration for multi-lingual cascading text compressors

  • Author

    Chi, Chi-Hung ; Zhang, Yan

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore
  • fYear
    1999
  • fDate
    29-31 Mar 1999
  • Firstpage
    520
  • Abstract
    Summary form only given. We study the cascading of LZ variants to Huffman coding for multilingual documents. Two models are proposed: the static model and the adaptive (dynamic) model. The static model makes use of the dictionary generated by the LZW algorithm in Chinese dictionary-based Huffman compression to achieve better performance. The dynamic model is an extension of the static cascading model. During the insertion of phrases into the dictionary the frequency count of the phrases is updated so that a dynamic Huffman tree with variable length output tokens is obtained. We propose a new method to capture the “LZW dictionary” “by picking up the dictionary entries during decompression. The general idea is the adding of delimiters during the decompression process so that the decompressed files are segmented into phrases that reflect how the LZW compressor makes use of its dictionary phrases to encode the source. The idea of the adaptive cascading model can be thought as an extension of the Chinese LZW compression. Since the size of the header is one important performance bottleneck in the static cascading model we propose the adaptive cascading model to address this issue. The LZW compressor is now outputting not a fixed length token, but a variable length Huffman code from the Huffman tree. It is expected that such a compressor can achieve very good compression performance. In our adaptive cascading model we choose LZW instead of LZSS because the LZW algorithm preserves more information than the LZSS algorithm does. This characteristic is found to be very useful in helping Chinese compressors to attain better performance
  • Keywords
    Huffman codes; adaptive codes; character sets; source coding; text analysis; tree data structures; variable length codes; Chinese dictionary; LZW dictionary; adaptive cascading model; cascading; decompression; dynamic Huffman tree; multilingual documents; multilingual text compressors; performance; source coding; static model; variable length Huffman code; Compressors; Dictionaries; Frequency; Huffman coding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1999. Proceedings. DCC '99
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-0096-X
  • Type

    conf

  • DOI
    10.1109/DCC.1999.785677
  • Filename
    785677