DocumentCode
3621511
Title
Compression of small text files using syllables
Author
J. Lansky;M. Zemlicka
Author_Institution
Fac. of Mathematics & Phys., Charles Univ., Czech Republic
fYear
2006
fDate
6/28/1905 12:00:00 AM
Lastpage
458
Abstract
Summary form only given. We adapted well-known algorithms of adaptive Huffman coding and LZW to use syllables and words instead of characters for text compression. We tested the algorithms on collections of small or middle-sized files. Using syllable-based compression algorithms on English documents gives expected results: they outperform character-based and are outperformed by word-based versions of the same algorithm. According our tests both syllable- and word-based compression methods are sensitive to initial setting of their dictionaries. The decomposition of words into syllables is not trivial and is language dependent. An open issue is the applicability of syllable-based compression for different languages (like German, Rusian, or Hungarian) and its use in conjunction with other algorithms like block-sorting lossless compression
Keywords
"Natural languages","Mathematics","Physics","Huffman coding","Testing","Data compression","Writing","Encoding","Compression algorithms","Adaptive coding"
Publisher
ieee
Conference_Titel
Data Compression Conference, 2006. DCC 2006. Proceedings
ISSN
1068-0314
Print_ISBN
0-7695-2545-8
Type
conf
DOI
10.1109/DCC.2006.16
Filename
1607301
Link To Document