• DocumentCode
    3226552
  • Title

    DCA Using Suffix Arrays

  • Author

    Fiala, Martin ; Holub, Jan

  • Author_Institution
    Czech Tech. Univ. in Prague, Prague
  • fYear
    2008
  • fDate
    25-27 March 2008
  • Firstpage
    516
  • Lastpage
    516
  • Abstract
    DCA (Data Compression using Antidictionaries) is a novel lossless data compression method working on bit streams presented by Crochemore et al. DCA takes advantage of words that do not occur as factors in the text, i.e. that are forbidden. Due to these forbidden words (antiwords), some symbols in the text can be predicted. We build the antidictionary using suffix array in time O(k * N log N), where k is maximal antiword length. Length of suffix array and LCP constructed over the binary alphabet will be 8 times length of the input text. Still memory requirements for suffix array and LCP construction depend only on the length N of input text with O(N), instead of suffix trie with exponential complexity.
  • Keywords
    computational complexity; data compression; data structures; text analysis; exponential complexity; lossless data compression method; maximal antiword length; suffix array; suffix trie; text symbol prediction; time complexity; Compressors; Computer science; Data compression; Data engineering; Encoding; Optical arrays; Transducers; Data Compression using Antidictionaries; suffix array; suffix trie;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2008. DCC 2008
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-0-7695-3121-2
  • Type

    conf

  • DOI
    10.1109/DCC.2008.95
  • Filename
    4483343