DocumentCode :
3321956
Title :
Efficient Merging and Filtering Algorithms for Approximate String Searches
Author :
Li, Chen ; Lu, Jiaheng ; Lu, Yiming
Author_Institution :
Dept. of Comput. Sci., Univ. of California, Irvine, CA
fYear :
2008
fDate :
7-12 April 2008
Firstpage :
257
Lastpage :
266
Abstract :
We study the following problem: how to efficiently find in a collection of strings those similar to a given query string? Various similarity functions can be used, such as edit distance, Jaccard similarity, and cosine similarity. This problem is of great interests to a variety of applications that need a high real-time performance, such as data cleaning, query relaxation, and spellchecking. Several algorithms have been proposed based on the idea of merging inverted lists of grams generated from the strings. In this paper we make two contributions. First, we develop several algorithms that can greatly improve the performance of existing algorithms. Second, we study how to integrate existing filtering techniques with these algorithms, and show that they should be used together judiciously, since the way to do the integration can greatly affect the performance. We have conducted experiments on several real data sets to evaluate the proposed techniques.
Keywords :
information filtering; merging; string matching; approximate string search; merging-filtering algorithm; Cleaning; Computer science; Databases; Dictionaries; Filtering algorithms; Management information systems; Merging; Postal services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
Type :
conf
DOI :
10.1109/ICDE.2008.4497434
Filename :
4497434
Link To Document :
بازگشت