• DocumentCode
    3227074
  • Title

    Suffix Array for Large Alphabet

  • Author

    Sestak, R. ; Lansky, J. ; Zemlicka, Michal

  • Author_Institution
    Charles Univ., Prague
  • fYear
    2008
  • fDate
    25-27 March 2008
  • Firstpage
    543
  • Lastpage
    543
  • Abstract
    Burrows-Wheeler Transform (BWT) is used as the main part in block compression which has a good balance of speed and compression ratio. Suffix arrays are used in the coding phase of BWT and we focus on creating them for an alphabet larger than 256 symbols. The motivation for this work has been software project XBW-an application for compression of large XML files using word- and syllable-based BWT. The role of BWT is to reorder input before applying other algorithms. We describe and implement three families of algorithms for encoding. Finally we present algorithm by Karkkainen and Sanders for constructing suffix arrays in linear time.
  • Keywords
    XML; data compression; text analysis; transforms; Burrows-Wheeler transform; XBW software project; XML files compression; alphabet coding; block compression; suffix array; syllable-based BWT; textual files; word-based BWT; Application software; Arithmetic; Data compression; Encoding; Mathematics; Phased arrays; Physics; Sorting; Testing; XML; Burrows-Wheeler transform; suffix array sorting; text compression; word-based compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2008. DCC 2008
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-0-7695-3121-2
  • Type

    conf

  • DOI
    10.1109/DCC.2008.22
  • Filename
    4483370