• DocumentCode
    1336106
  • Title

    High dimensional similarity joins: algorithms and performance evaluation

  • Author

    Koudas, Nick ; Sevcik, Kenneth C.

  • Author_Institution
    AT&T Bell Labs., Florham Park, NJ, USA
  • Volume
    12
  • Issue
    1
  • fYear
    2000
  • Firstpage
    3
  • Lastpage
    18
  • Abstract
    Current data repositories include a variety of data types, including audio, images, and time series. State-of-the-art techniques for indexing such data and doing query processing rely on a transformation of data elements into points in a multidimensional feature space. Indexing and query processing then take place in the feature space. We study algorithms for finding relationships among points in multidimensional feature spaces, specifically algorithms for multidimensional joins. Like joins of conventional relations, correlations between multidimensional feature spaces can offer valuable information about the data sets involved. We present several algorithmic paradigms for solving the multidimensional join problem and we discuss their features and limitations. We propose a generalization of the size separation spatial join algorithm, named multidimensional spatial join (MSJ), to solve the multidimensional join problem. We evaluate MSJ along with several other specific algorithms, comparing their performance for various dimensionalities on both real and synthetic multidimensional data sets. Our experimental results indicate that MSJ, which is based on space filling curves, consistently yields good performance across a wide range of dimensionalities
  • Keywords
    database indexing; database theory; merging; query processing; relational algebra; relational databases; software performance evaluation; sorting; spatial data structures; audio; data repositories; data sets; data structures; data types; experiment; high dimensional similarity joins; image database; indexing; multidimensional feature space; multidimensional spatial join; performance evaluation; query processing; size separation spatial join algorithm; sort merge joins; space filling curves; time series; Computer Society; Data mining; Data structures; Feature extraction; Filling; Image databases; Indexing; Multidimensional systems; Multimedia databases; Query processing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.842246
  • Filename
    842246