Title :
Evolutionary synthesis of lossless compression algorithms with GP-zip3
Author :
Kattan, Ahmed ; Poli, Riccardo
Author_Institution :
Sch. of Comput. Sci. & Electron. Eng., Univ. of Essex, Colchester, UK
Abstract :
Here we propose GP-zip3, a system which uses Genetic Programming to find optimal ways to combine standard compression algorithms for the purpose of compressing files and archives. GP-zip3 evolves programs with multiple components. One component analyses statistical features extracted from the raw data to be compressed (seen as a sequence of 8-bit integers) to divide the data into blocks. These blocks are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar data blocks. Each cluster is then labelled with the optimal compression algorithm for its member blocks. Once a program that achieves good compression is evolved, it can be used on unseen data without the requirement for any further evolution. GP-zip3 is similar to its predecessor, GP-zip2. Both systems outperform a variety of standard compression algorithms and are faster than other evolutionary compression techniques. However, GP-zip2 was still substantially slower than off-the-shelf algorithms. GP-zip3 alleviates this problem by using a novel fitness evaluation strategy. More specifically, GP-zip3 evolves and then uses decision trees to predict the performance of GP individuals without requiring them to be used to compress the training data. As shown in a variety of experiments, this speeds up evolution in GP-zip3 considerably over GP-zip2 while achieving similar compression results, thereby significantly broadening the scope of application of the approach.
Keywords :
data compression; decision trees; evolutionary computation; feature extraction; file organisation; genetic algorithms; pattern clustering; statistical analysis; GP-zip3; K-mean clustering; data compression; decision tree; evolutionary synthesis; file compression; fitness evaluation strategy; genetic programming; lossless compression algorithm; optimal compression algorithm; standard compression algorithm; statistical feature extraction; two-dimensional Euclidean space; Compression algorithms; Data models; Estimation; Feature extraction; Image coding; Predictive models; Training;
Conference_Titel :
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6909-3
DOI :
10.1109/CEC.2010.5585956