Title :
Searching BWT compressed text with the Boyer-Moore algorithm and binary search
Author :
Bell, Tim ; Powell, Matt ; Mukherjee, Amar ; Adjeroh, Don
Author_Institution :
Dept. of Comput. Sci., Univ. of Canterbury, New Zealand
Abstract :
This paper explores two techniques for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (1977) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompress-and-search approach for small numbers of queries, and binary search is much faster even for large numbers of queries.
Keywords :
data compression; search problems; string matching; text analysis; transforms; BWT; Boyer-Moore algorithm; Burrows-Wheeler transform; binary search; compressed text; on-line exact pattern matching; queries; searching; sorted list; substrings; transformed string; Computer science; Data compression; Encoding; Image coding; Pattern matching; USA Councils;
Conference_Titel :
Data Compression Conference, 2002. Proceedings. DCC 2002
Print_ISBN :
0-7695-1477-4
DOI :
10.1109/DCC.2002.999949