DocumentCode
2477257
Title
Searching BWT compressed text with the Boyer-Moore algorithm and binary search
Author
Bell, Tim ; Powell, Matt ; Mukherjee, Amar ; Adjeroh, Don
Author_Institution
Dept. of Comput. Sci., Univ. of Canterbury, New Zealand
fYear
2002
fDate
2002
Firstpage
112
Lastpage
121
Abstract
This paper explores two techniques for on-line exact pattern matching in files that have been compressed using the Burrows-Wheeler transform. We investigate two approaches. The first is an application of the Boyer-Moore algorithm (1977) to a transformed string. The second approach is based on the observation that the transform effectively contains a sorted list of all substrings of the original text, which can be exploited for very rapid searching using a variant of binary search. Both methods are faster than a decompress-and-search approach for small numbers of queries, and binary search is much faster even for large numbers of queries.
Keywords
data compression; search problems; string matching; text analysis; transforms; BWT; Boyer-Moore algorithm; Burrows-Wheeler transform; binary search; compressed text; on-line exact pattern matching; queries; searching; sorted list; substrings; transformed string; Computer science; Data compression; Encoding; Image coding; Pattern matching; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2002. Proceedings. DCC 2002
ISSN
1068-0314
Print_ISBN
0-7695-1477-4
Type
conf
DOI
10.1109/DCC.2002.999949
Filename
999949
Link To Document