• DocumentCode
    2638802
  • Title

    High dimensional similarity joins: algorithms and performance evaluation

  • Author

    Koudas, Nick ; Sevcik, K.C.

  • Author_Institution
    Dept. of Comput. Sci., Toronto Univ., Ont., Canada
  • fYear
    1998
  • fDate
    23-27 Feb 1998
  • Firstpage
    466
  • Lastpage
    475
  • Abstract
    Current data repositories include a variety of data types, including audio, images and time series. State of the art techniques for indexing such data and doing query processing rely on a transformation of data elements into points in a multidimensional feature space. Indexing and query processing then take place in the feature space. We study algorithms for finding relationships among points in multidimensional feature spaces, specifically algorithms for multidimensional joins. Like joins of conventional relations, correlations between multidimensional feature spaces can offer valuable information about the data sets involved. We present several algorithmic paradigms for solving the multidimensional join problem, and we discuss their features and limitations. We propose a generalization of the Size Separation Spatial Join algorithm, named Multidimensional Spatial Join (MSJ), to solve the multidimensional join problem. We evaluate MSJ along with several other specific algorithms, comparing their performance for various dimensionalities on both real and synthetic multidimensional data sets. Our experimental results indicate that MSJ, which is based on space filling curves, consistently yields good performance across a wide range of dimensionalities
  • Keywords
    multimedia computing; query processing; relational algebra; spatial data structures; Multidimensional Spatial Join; Size Separation Spatial Join algorithm; algorithmic paradigms; data elements; data repositories; data sets; data types; high dimensional similarity joins; multidimensional feature space; multidimensional join problem; performance evaluation; query processing; space filling curves; synthetic multidimensional data sets; time series; Computer science; Data mining; Feature extraction; Image databases; Indexing; Multidimensional systems; Multimedia databases; Query processing; Spatial databases; Visual databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 1998. Proceedings., 14th International Conference on
  • Conference_Location
    Orlando, FL
  • ISSN
    1063-6382
  • Print_ISBN
    0-8186-8289-2
  • Type

    conf

  • DOI
    10.1109/ICDE.1998.655809
  • Filename
    655809