• DocumentCode
    3621511
  • Title

    Compression of small text files using syllables

  • Author

    J. Lansky;M. Zemlicka

  • Author_Institution
    Fac. of Mathematics & Phys., Charles Univ., Czech Republic
  • fYear
    2006
  • fDate
    6/28/1905 12:00:00 AM
  • Lastpage
    458
  • Abstract
    Summary form only given. We adapted well-known algorithms of adaptive Huffman coding and LZW to use syllables and words instead of characters for text compression. We tested the algorithms on collections of small or middle-sized files. Using syllable-based compression algorithms on English documents gives expected results: they outperform character-based and are outperformed by word-based versions of the same algorithm. According our tests both syllable- and word-based compression methods are sensitive to initial setting of their dictionaries. The decomposition of words into syllables is not trivial and is language dependent. An open issue is the applicability of syllable-based compression for different languages (like German, Rusian, or Hungarian) and its use in conjunction with other algorithms like block-sorting lossless compression
  • Keywords
    "Natural languages","Mathematics","Physics","Huffman coding","Testing","Data compression","Writing","Encoding","Compression algorithms","Adaptive coding"
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2006. DCC 2006. Proceedings
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2545-8
  • Type

    conf

  • DOI
    10.1109/DCC.2006.16
  • Filename
    1607301