DocumentCode
2465061
Title
Searching for optimal alphabet for data compression using simulated annealing
Author
Platos, Jan ; Kromer, Pavel
Author_Institution
Dept. of Comput. Sci., VSB-Tech. Univ. of Ostrava, Ostrava, Czech Republic
fYear
2012
fDate
14-17 Oct. 2012
Firstpage
468
Lastpage
473
Abstract
Data compression is very important today and it will be even more important in the future. Textual data use only limited alphabet - total number of used symbols (letters, numbers, diacritics, dots, spaces, etc.). In most languages, letters are joined into syllables and words. All three approaches are useful in text compression, but none of them is the best for any file. This paper describes a variant of algorithm for evolving alphabet from characters, 2-grams and 3-grams, which is optimal for compression of text files. We used Simulated Annealing for this evolution of the alphabet. The efficiency of the new variant will be tested on four compression algorithms. The achieved results are very promising.
Keywords
data compression; simulated annealing; text analysis; 2-grams; 3-grams; characters; compression algorithm; data compression; evolving alphabet; optimal alphabet; simulated annealing; syllables; text file compression; textual data; words; Compression algorithms; Cooling; Data compression; Encoding; Genetic algorithms; Simulated annealing; Burrows Wheeler transformation; Huffman encoding; LZ77; LZW; alphabet optimization; data compression; simulated annealing;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
Conference_Location
Seoul
Print_ISBN
978-1-4673-1713-9
Electronic_ISBN
978-1-4673-1712-2
Type
conf
DOI
10.1109/ICSMC.2012.6377768
Filename
6377768
Link To Document