• DocumentCode
    3254366
  • Title

    Constructing word-based text compression algorithms

  • Author

    Horspool, R. Nigel ; Cormack, Gordon V.

  • Author_Institution
    Dept. of Comput. Sci., Victoria Univ., Victoria, BC, Canada
  • fYear
    1992
  • fDate
    24-27 March 1992
  • Firstpage
    62
  • Lastpage
    71
  • Abstract
    Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<>
  • Keywords
    data compression; encoding; 8 bit; ASCII codes; English; LZW compression algorithm; adaptive Huffman compression algorithm; alphanumeric characters; alternate maximal strings; coding; correlations; nonalphanumeric characters; source alphabet; text compression algorithms; word-based context modelling; Compression algorithms; Computer science; Context modeling; Encoding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1992. DCC '92.
  • Conference_Location
    Snowbird, UT, USA
  • Print_ISBN
    0-8186-2717-4
  • Type

    conf

  • DOI
    10.1109/DCC.1992.227475
  • Filename
    227475