Title :
A new approach of DCA by using BWT
Author :
Zhao, Bo ; Iwata, Ken-Ichi ; Itoh, Shuichi ; Kato, Toshihiko
Author_Institution :
Univ. of Electro-Commun., Chofu, Japan
Abstract :
Summary form only given. M. Crochemore et al. introduced a two-pass lossless data compression scheme called data compression using antidictionaries (DCA) (see Proc. IEEE, vol.88, no.11, p.1756-68, 2000). DCA finds some words (minimal forbidden words) that never appear in the text to be compressed, and chooses a subset of minimal forbidden words as an antidictionary (AD). We develop improved DCAs in two ways. (1) We propose an improved DCA using, instead of the AD, predictable dictionaries made up of the words whose last symbols are uniquely determined by their prefixes. The good properties of the original DCA hold in this extension. (2) We present an algorithm to construct the AD (or predictable dictionary) by using a suffix array of the text. We can reduce the amount of memory necessary to construct an AD by using the suffix array. Moreover, by applying a suffix array instead of a suffix tree, we can exploit the relationship between the suffix array and the BWT (Burrows-Wheeler transform), giving us the chance of choosing between DCA and the Burrows-Wheeler compression algorithm (BWCA).
Keywords :
data compression; dictionaries; text analysis; transform coding; transforms; Burrows-Wheeler compression algorithm; Burrows-Wheeler transform; data compression using antidictionaries; minimal forbidden words; predictable dictionaries; suffix array; suffix tree; Compression algorithms; Data compression; Decoding; Dictionaries; Encoding; Parallel processing; Search engines;
Conference_Titel :
Data Compression Conference, 2005. Proceedings. DCC 2005
Print_ISBN :
0-7695-2309-9