• DocumentCode
    2465061
  • Title

    Searching for optimal alphabet for data compression using simulated annealing

  • Author

    Platos, Jan ; Kromer, Pavel

  • Author_Institution
    Dept. of Comput. Sci., VSB-Tech. Univ. of Ostrava, Ostrava, Czech Republic
  • fYear
    2012
  • fDate
    14-17 Oct. 2012
  • Firstpage
    468
  • Lastpage
    473
  • Abstract
    Data compression is very important today and it will be even more important in the future. Textual data use only limited alphabet - total number of used symbols (letters, numbers, diacritics, dots, spaces, etc.). In most languages, letters are joined into syllables and words. All three approaches are useful in text compression, but none of them is the best for any file. This paper describes a variant of algorithm for evolving alphabet from characters, 2-grams and 3-grams, which is optimal for compression of text files. We used Simulated Annealing for this evolution of the alphabet. The efficiency of the new variant will be tested on four compression algorithms. The achieved results are very promising.
  • Keywords
    data compression; simulated annealing; text analysis; 2-grams; 3-grams; characters; compression algorithm; data compression; evolving alphabet; optimal alphabet; simulated annealing; syllables; text file compression; textual data; words; Compression algorithms; Cooling; Data compression; Encoding; Genetic algorithms; Simulated annealing; Burrows Wheeler transformation; Huffman encoding; LZ77; LZW; alphabet optimization; data compression; simulated annealing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-1713-9
  • Electronic_ISBN
    978-1-4673-1712-2
  • Type

    conf

  • DOI
    10.1109/ICSMC.2012.6377768
  • Filename
    6377768