Title :
Improving PPM Algorithm Using Dictionaries
Author :
Yichuan Hu ; Jianzhong Zhang ; Farooq Khan ; Ying Li
Author_Institution :
Dept. of ESE, Univ. of Pennsylvania, Philadelphia, PA, USA
Abstract :
We propose a method to improve traditional character-based PPM text compression algorithm for natural languages. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Details about the algorithm are described below.
Keywords :
data compression; dictionaries; natural language processing; text analysis; alternating words; character based PPM text compression algorithm; character based context models; dictionary models; natural languages; non words; words suffixes; Computational modeling; Context; Context modeling; Data compression; Decoding; Dictionaries; Encoding; Dictionary model; Markov model; PPM; Text compression!!;
Conference_Titel :
Data Compression Conference (DCC), 2011
Conference_Location :
Snowbird, UT
Print_ISBN :
978-1-61284-279-0
DOI :
10.1109/DCC.2011.63