DocumentCode :
3621511
Title :
Compression of small text files using syllables
Author :
J. Lansky;M. Zemlicka
Author_Institution :
Fac. of Mathematics & Phys., Charles Univ., Czech Republic
fYear :
2006
fDate :
6/28/1905 12:00:00 AM
Lastpage :
458
Abstract :
Summary form only given. We adapted well-known algorithms of adaptive Huffman coding and LZW to use syllables and words instead of characters for text compression. We tested the algorithms on collections of small or middle-sized files. Using syllable-based compression algorithms on English documents gives expected results: they outperform character-based and are outperformed by word-based versions of the same algorithm. According our tests both syllable- and word-based compression methods are sensitive to initial setting of their dictionaries. The decomposition of words into syllables is not trivial and is language dependent. An open issue is the applicability of syllable-based compression for different languages (like German, Rusian, or Hungarian) and its use in conjunction with other algorithms like block-sorting lossless compression
Keywords :
"Natural languages","Mathematics","Physics","Huffman coding","Testing","Data compression","Writing","Encoding","Compression algorithms","Adaptive coding"
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2006. DCC 2006. Proceedings
ISSN :
1068-0314
Print_ISBN :
0-7695-2545-8
Type :
conf
DOI :
10.1109/DCC.2006.16
Filename :
1607301
Link To Document :
بازگشت