Title :
DCA Using Suffix Arrays
Author :
Fiala, Martin ; Holub, Jan
Author_Institution :
Czech Tech. Univ. in Prague, Prague
Abstract :
DCA (Data Compression using Antidictionaries) is a novel lossless data compression method working on bit streams presented by Crochemore et al. DCA takes advantage of words that do not occur as factors in the text, i.e. that are forbidden. Due to these forbidden words (antiwords), some symbols in the text can be predicted. We build the antidictionary using suffix array in time O(k * N log N), where k is maximal antiword length. Length of suffix array and LCP constructed over the binary alphabet will be 8 times length of the input text. Still memory requirements for suffix array and LCP construction depend only on the length N of input text with O(N), instead of suffix trie with exponential complexity.
Keywords :
computational complexity; data compression; data structures; text analysis; exponential complexity; lossless data compression method; maximal antiword length; suffix array; suffix trie; text symbol prediction; time complexity; Compressors; Computer science; Data compression; Data engineering; Encoding; Optical arrays; Transducers; Data Compression using Antidictionaries; suffix array; suffix trie;
Conference_Titel :
Data Compression Conference, 2008. DCC 2008
Conference_Location :
Snowbird, UT
Print_ISBN :
978-0-7695-3121-2
DOI :
10.1109/DCC.2008.95