Title :
ETAOSD: Static dictionary-based transformation method for text compression
Author :
Baloul, Fadlelmoula Mohamed ; Abdullah, Mohd Harun ; Babikir, Elsadig Ahmed
Author_Institution :
Dept. of Inf. Technol., Coll. of Appl. Sci., Sohar, Oman
Abstract :
The aim of this paper is to present a new static dictionary-based algorithm for text transformation to increase the data compression ratio when using standard compression tools. The basic idea of the new algorithm is to define a pattern for each word in a static dictionary by replacing all or most of the characters in the words of the dictionary by the most frequently used character in any text file. The proposed algorithm transforms any text file into another encrypted file with a size almost the same as that of the original text file but with different statistical properties. The new transformation method has been designed, implemented, and tested using Gutenburg Corpus. Generally, the output result has shown different levels of enhancements on different common standard data compression tools such as Arithmetic, Huffman, Bzip2, Gzip and WinZip. The compression performance of all common compression tools has been enhanced especially when the patterns of the transformed words passed through costless running length encoding (RLE) algorithm. On using Bzip2, the resultant output files produced about 76.75% as compression ratio with 1.88 as average code length. The final result is very promising and it could be enhanced more in case of applying dynamic dictionary-based text transformation technique.
Keywords :
cryptography; data compression; dictionaries; statistical analysis; text analysis; Bzip2 tool; ETAOSD; Gutenburg Corpus; Gzip tool; Huffman tool; RLE algorithm; WinZip tool; arithmetic tool; data compression ratio; dynamic dictionary-based text transformation technique; encrypted file; running length encoding algorithm; standard compression tools; static dictionary-based algorithm; static dictionary-based transformation method; statistical properties; text compression; text file transformation; Data compression; Decoding; Dictionaries; Encoding; Redundancy; Standards; Stress; Average Code Length (ACL); Text Compression; Text Preprocessing; Text Transformation;
Conference_Titel :
Computing, Electrical and Electronics Engineering (ICCEEE), 2013 International Conference on
Conference_Location :
Khartoum
Print_ISBN :
978-1-4673-6231-3
DOI :
10.1109/ICCEEE.2013.6633967