DocumentCode :
3226552
Title :
DCA Using Suffix Arrays
Author :
Fiala, Martin ; Holub, Jan
Author_Institution :
Czech Tech. Univ. in Prague, Prague
fYear :
2008
fDate :
25-27 March 2008
Firstpage :
516
Lastpage :
516
Abstract :
DCA (Data Compression using Antidictionaries) is a novel lossless data compression method working on bit streams presented by Crochemore et al. DCA takes advantage of words that do not occur as factors in the text, i.e. that are forbidden. Due to these forbidden words (antiwords), some symbols in the text can be predicted. We build the antidictionary using suffix array in time O(k * N log N), where k is maximal antiword length. Length of suffix array and LCP constructed over the binary alphabet will be 8 times length of the input text. Still memory requirements for suffix array and LCP construction depend only on the length N of input text with O(N), instead of suffix trie with exponential complexity.
Keywords :
computational complexity; data compression; data structures; text analysis; exponential complexity; lossless data compression method; maximal antiword length; suffix array; suffix trie; text symbol prediction; time complexity; Compressors; Computer science; Data compression; Data engineering; Encoding; Optical arrays; Transducers; Data Compression using Antidictionaries; suffix array; suffix trie;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2008. DCC 2008
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-0-7695-3121-2
Type :
conf
DOI :
10.1109/DCC.2008.95
Filename :
4483343
Link To Document :
بازگشت