Title :
Substitution coder — A reversible data transform for lossless text compression
Author :
Rexline, S.J. ; Robert, L.
Author_Institution :
Dept. of Comput. Sci., Loyola Coll., Chennai, India
Abstract :
In this paper, we refer to a new text transformation technique to move forward the existing lossless, reversible text makeover technique called Substitution coder. Substitution coder is a class of lossless text transformation algorithms which operates by searching for matches between the text to be compressed and a set of words contained in a dictionary, maintained by the coder. When the coder identifies such a match, it substitutes a reference to the word´s position in the dictionary. Some dictionary coders use a static dictionary in which an entire set of words is determined before coding commences and does not vary during the transformation process. This kind of approach is most often used when the information to be encoded is fixed and large. A lossless data compression algorithm should preserve the data during the encoding and decoding process. The purpose of the fixed sequence word maintained in the dictionary is to generate some kind of fixed but artificial context in the transformed text that can be exploited by the backend compression algorithm. According to our approach, the Dictionary prop ups to append the new words without changing codes for existing words. The proposed algorithm is implemented and tested using Calgary corpus and Gutenberg files.
Keywords :
data compression; decoding; text analysis; Calgary corpus; Gutenberg files; backend compression algorithm; decoding process; dictionary coders; encoding process; lossless data compression algorithm; lossless text compression; lossless text transformation algorithms; reversible data transform; reversible text makeover technique; substitution coder; Compression algorithms; Data compression; Decoding; Dictionaries; Encoding; Sorting; Transforms; BZIP2; Dictionary based compression; decoding; encoding; preprocessing; word transformation;
Conference_Titel :
Information, Communications and Signal Processing (ICICS) 2011 8th International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4577-0029-3
DOI :
10.1109/ICICS.2011.6173125