Title :
Record preprocessing for data compression
Author_Institution :
Dept. "Commun. Syst.", Univ. Duisburg-Essen, Duisburg, Germany
Abstract :
A new preprocessing scheme for lossless data compression is presented in this paper, which exploits the structure of record-based files. Record based files can be database files but also image files, for which a row of the image represents a record, or sequential data files. Symbol repetitions, which occur at the same position inside a sequence of records with fixed length, are detected and treated by a special transformation. The compression rate of such files can often be enhanced if the file is transposed by the record length before compression and after decompression. The presented approach is able to detect files with such a structure and to determine the corresponding record length. The impact on the compression rate is compared between BWT-, PPM- and LZ based compression algorithms. For some files a compression gain of more than 80 percent can be reached. The presented approach is used for all standard compression algorithms, though context based algorithms and tends to exploit the transposed structure better than dictionary-based schemes.
Keywords :
data compression; file organisation; context based algorithm; database files; image files; lossless data compression; record-based files; sequential data files; symbol repetitions; Data compression;
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
Print_ISBN :
0-7695-2082-0
DOI :
10.1109/DCC.2004.1281497