• DocumentCode
    519882
  • Title

    Language-independent word-based text compression with fast decompression

  • Author

    Grabowski, Szymon ; Swacha, Jakub

  • Author_Institution
    Comput. Eng. Dept., Tech. Univ. of Lodz, Lodz, Poland
  • fYear
    2010
  • fDate
    20-23 April 2010
  • Firstpage
    158
  • Lastpage
    162
  • Abstract
    A classic idea to improve text compression is to replace words with references to a text dictionary, either external or stored together with the archive. We advocate for the second option, as even with one language in mind (e.g., English) it is rather impossible to have a single dictionary fitting well different sorts of modern texts. There are basically two problems to solve, which are how to assign codewords to individual words from the parsed text, and how to represent the dictionary compactly. The resulting data are input for a backend compressor. Since in many scenarios texts are decompressed (read) more often than compressed (written), we focus on LZ77 backend compression algorithms, in particular Deflate, used in zip/gzip standards, whose well-known asset is very fast decompression.
  • Keywords
    data compression; text analysis; word processing; Deflate; LZ77 backend compression algorithms; codewords; fast decompression; language independent word based text compression; parsed text; text dictionary; zip-gzip standards; Cascading style sheets; Compression algorithms; DNA; Dictionaries; HTML; Natural languages; Postal services; Protein sequence; Spatial databases; XML; byte codes; dictionary compression; text compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Perspective Technologies and Methods in MEMS Design (MEMSTECH), 2010 Proceedings of VIth International Conference on
  • Conference_Location
    Lviv
  • Print_ISBN
    978-1-4244-7325-0
  • Electronic_ISBN
    978-966-2191-11-0
  • Type

    conf

  • Filename
    5499297