Title :
Spatial Approximate String Search
Author :
Feifei Li ; Bin Yao ; Mingwang Tang ; Hadjieleftheriou, MariosMarios
Author_Institution :
Sch. of Comput., Univ. of Utah, Salt Lake City, UT, USA
Abstract :
This work deals with the approximate string search in large spatial databases. Specifically, we investigate range queries augmented with a string similarity search predicate in both euclidean space and road networks. We dub this query the spatial approximate string (SAS) query. In euclidean space, we propose an approximate solution, the MHR-tree, which embeds min-wise signatures into an R-tree. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the subtree of u. We analyze the pruning functionality of such signatures based on the set resemblance between the query string and the q-grams from the subtrees of index nodes. We also discuss how to estimate the selectivity of a SAS query in euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information stored in the tree. For queries on road networks, we propose a novel exact method, RSASSOL, which significantly outperforms the baseline algorithm in practice. The RSASSOL combines the q-gram-based inverted lists and the reference nodes based pruning. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approaches.
Keywords :
approximation theory; query processing; trees (mathematics); visual databases; Euclidean space; MHR-tree; RSASSOL; SAS query; balanced partitions; baseline algorithm; concise representation; exact method; index node subtrees; large spatial databases; min-wise signature; min-wise signatures; q-gram-based inverted lists; range queries; reference node based pruning; road networks; spatial approximate string query; spatial approximate string search; spatial information; string information; string similarity search; Equations; Estimation; Indexes; Mathematical model; Roads; Spatial databases; Synthetic aperture sonar; Approximate string search; range query; road network; spatial databases;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2012.48