DocumentCode
3013072
Title
A cost model and index architecture for the similarity join
Author
Böhm, Christian ; Kriegel, Hans-Peter
Author_Institution
Munich Univ., Germany
fYear
2001
fDate
2001
Firstpage
411
Lastpage
420
Abstract
The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter ε. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown
Keywords
database indexing; optimisation; query processing; relational algebra; tree data structures; CPU efficiency; CPU time; I/O performance; I/O processing; I/O time; analytical cost model; competitive techniques; computational effort; cost model; data mining algorithms; database primitive; experimental evaluation; fine-grained index structures; index architecture; join algorithm; large pages; multidimensional vector space; optimization conflict; point pairs; point sets; practical relevance; problem analysis; search structure; similarity join; similarity join algorithms; similarity join operation; Algorithm design and analysis; Biomedical imaging; Clustering algorithms; Costs; Data mining; Image analysis; Multidimensional systems; Performance analysis; Spatial databases; Time series analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2001. Proceedings. 17th International Conference on
Conference_Location
Heidelberg
ISSN
1063-6382
Print_ISBN
0-7695-1001-9
Type
conf
DOI
10.1109/ICDE.2001.914854
Filename
914854
Link To Document