DocumentCode :
2944729
Title :
Compressed Dictionary Matching with One Error
Author :
Hon, Wing-Kai ; Ku, Tsung-Han ; Shah, Rahul ; Thankachan, Sharma V. ; Vitter, Jeffrey Scott
Author_Institution :
Dept of CS, Nat. Tsing Hua Univ., Hsinchu, Taiwan
fYear :
2011
fDate :
29-31 March 2011
Firstpage :
113
Lastpage :
122
Abstract :
Given a set D of d patterns of total length n, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. This problem can be solved in optimal O(|T|+occ) time by the classical AC automaton (Aho and Corasick, 1975) where occ denotes the number of occurrences. The space requirement is O(n) words. In the {approximate} dictionary matching problem with one error, we consider a substring of T[i..j] an occurrence of P whenever the edit distance between T[i..j] and P is at most one. For this problem, the best known indexes are by Cole et al. (2004), which requires O(n+ dlog{d}) words of space and reports all occurrences in O(|T|log{d}log{log{d}}+occ) time, and by Ferragina et al. (1999), which requires O(n^{1+epsilon}) words of space and reports all occurrences in O(|T|loglog n + occ) time. Recently, there have been successes in compressing the dictionary matching index while keeping the query time optimal (Belazzougui, 2010, Hon et al., 2010). However, a compressed index for approximate dictionary matching problem is still open. In this paper, we propose the first such index which requires an optimal nH_k+O(n)+o(nlogsigma)-bit index space, where H_k denotes the kth-order empirical entropy of D, and sigma is the size of alphabet set from which all the characters in D and T are drawn. The query time of our index is O(σ|T|log3 n log log n + occ).
Keywords :
computational complexity; data compression; dictionaries; indexing; query processing; string matching; AC automaton; O(n) word; compressed dictionary matching; kth order empirical entropy; query text; Approximation algorithms; Automata; Computers; Dictionaries; Entropy; Indexes; Pattern matching; Approximate Matching; Compressed Indexes; Dictionary Matching; Suffix Tree;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2011
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-61284-279-0
Type :
conf
DOI :
10.1109/DCC.2011.18
Filename :
5749469
Link To Document :
بازگشت