DocumentCode :
3013072
Title :
A cost model and index architecture for the similarity join
Author :
Böhm, Christian ; Kriegel, Hans-Peter
Author_Institution :
Munich Univ., Germany
fYear :
2001
fDate :
2001
Firstpage :
411
Lastpage :
420
Abstract :
The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter ε. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown
Keywords :
database indexing; optimisation; query processing; relational algebra; tree data structures; CPU efficiency; CPU time; I/O performance; I/O processing; I/O time; analytical cost model; competitive techniques; computational effort; cost model; data mining algorithms; database primitive; experimental evaluation; fine-grained index structures; index architecture; join algorithm; large pages; multidimensional vector space; optimization conflict; point pairs; point sets; practical relevance; problem analysis; search structure; similarity join; similarity join algorithms; similarity join operation; Algorithm design and analysis; Biomedical imaging; Clustering algorithms; Costs; Data mining; Image analysis; Multidimensional systems; Performance analysis; Spatial databases; Time series analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2001. Proceedings. 17th International Conference on
Conference_Location :
Heidelberg
ISSN :
1063-6382
Print_ISBN :
0-7695-1001-9
Type :
conf
DOI :
10.1109/ICDE.2001.914854
Filename :
914854
Link To Document :
بازگشت