Title :
On the use of words as source alphabet symbols in PPM
Author :
Adiego, Joaquín ; De la Fuente, Pablo
Author_Institution :
Dpto. de Informatica, Valladolid Univ.
Abstract :
Summary form only given. We explore the use of words as the basic unit in PPM. Our goal has been carried out following two different ways: (1) we have added an additional previous layer to PPM that allows to replace words by two bytes codewords, and then these codewords will be codified with a conventional PPM; and (2) we have modified PPM so that it considers the words like symbols instead of characters, thus this PPM variation will make their predictions on consecutive words sequences handling words. Experimental results show that both techniques improve character-based compressors (including the PPM version used and adapted in the prototypes) for files of size greater than 1 Mb, due to the overload in storing the vocabulary. Prototype 1 accomplished compression in the same time but requiring slightly more memory (closest to PPMDi as the size grows) and it is about 5000% faster and it uses a 92% less of memory than PPMZ. Prototype 2 demanding much more time and memory and it is similar to the PPMZ, one of the better PPM variations
Keywords :
data compression; pulse position modulation; character-based compressors; codewords; source alphabet symbols; words sequences; Compression algorithms; Compressors; Data compression; HTML; Prototypes; Vocabulary;
Conference_Titel :
Data Compression Conference, 2006. DCC 2006. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-2545-8
DOI :
10.1109/DCC.2006.60