DocumentCode :
661901
Title :
Prefix filtering with data partitioning for similarity join
Author :
Bhirakit, Methus ; Chongstitvatana, Jaruloj
Author_Institution :
Dept. of Math. & Comput. Sci., Chulalongkorn Univ., Bangkok, Thailand
fYear :
2013
fDate :
4-6 Sept. 2013
Firstpage :
163
Lastpage :
167
Abstract :
Many applications, such as data integration, and data preparation, use similarity join as an important operation. In real-world applications, the challenge of similarity joins arises when data sets are large. Filter and verify methods have been proposed to reduce the running time of similarity join. The prefix filtering algorithm, which is one of the filter and verify methods, filters out some dissimilar strings by examining only the prefix of strings, instead of the whole strings. In this paper, we propose the data partitioning for prefix filtering method using in similarity join. For our approach, the database is divided into partitions and prefix filtering is performed for each partition of data. This proposed algorithm supports parallelism because filtering can be done on each partition independently. Furthermore, when the dataset is partitioned into smaller sets, a proper prefix length can be determined for each data partition. This also improves the selection of candidate strings, and reduces the verify time. An experiment is performed to compare the proposed algorithm to state-of-the-art algorithms. The experiment shows that our method achieves higher performance by reducing in the number of candidate strings and parallel execution.
Keywords :
database indexing; parallel algorithms; string matching; text analysis; data integration; data partitioning; data preparation; database partitions; dissimilar strings; parallel algorithm; parallel execution; prefix filtering algorithm; prefix length; similarity join; string prefix; text database; text document indexing; verify method; Computer science; Conferences; Similarity join; data partitioning; parallel join; prefix filtering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Engineering Conference (ICSEC), 2013 International
Conference_Location :
Nakorn Pathom
Print_ISBN :
978-1-4673-5322-9
Type :
conf
DOI :
10.1109/ICSEC.2013.6694772
Filename :
6694772
Link To Document :
بازگشت