DocumentCode :
1594571
Title :
High Performance Word-Codeword Mapping Algorithm on PPM
Author :
Adiego, Joaquin ; Martinez-Prieto, Migul A. ; Fuente, P.
Author_Institution :
Dipt. de Inf., Univ. de Valladolid, Valladolid
fYear :
2009
Firstpage :
23
Lastpage :
32
Abstract :
The word-codeword mapping technique allows words to be managed in PPM modelling when a natural language text file is being compressed. The main idea for managing words is to assign them codes in order to improve the compression. The previous work was focused on proposing several mapping adaptive algorithms and evaluating them. In this paper, we propose a semi-static word-codeword mapping method that takes advantage of by previous knowledge of some statistical data of the vocabulary. We test our idea implementing a basic prototype, dubbed mppm2, which also retains all the desirable features of a word-codeword mapping technique. The comparison with other techniques and compressors shows that our proposal is a very competitive choice for compressing natural language texts. In fact, empirical results show that our prototype achieves a very good compression for this type of documents.
Keywords :
data compression; natural language processing; statistical analysis; text analysis; PPM modelling; natural language text file; semi-static word-codeword mapping method; statistical data; vocabulary; word-codeword mapping technique; Data compression; Natural Language Modelling; PPM; Text Compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2009. DCC '09.
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4244-3753-5
Type :
conf
DOI :
10.1109/DCC.2009.40
Filename :
4976446
Link To Document :
بازگشت