• DocumentCode
    2477257
  • Title

    Searching BWT compressed text with the Boyer-Moore algorithm and binary search

  • Author

    Bell, Tim ; Powell, Matt ; Mukherjee, Amar ; Adjeroh, Don

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Canterbury, New Zealand
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    112
  • Lastpage
    121
  • Abstract
    This paper explores two techniques for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (1977) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompress-and-search approach for small numbers of queries, and binary search is much faster even for large numbers of queries.
  • Keywords
    data compression; search problems; string matching; text analysis; transforms; BWT; Boyer-Moore algorithm; Burrows-Wheeler transform; binary search; compressed text; on-line exact pattern matching; queries; searching; sorted list; substrings; transformed string; Computer science; Data compression; Encoding; Image coding; Pattern matching; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2002. Proceedings. DCC 2002
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-1477-4
  • Type

    conf

  • DOI
    10.1109/DCC.2002.999949
  • Filename
    999949