• DocumentCode
    3363471
  • Title

    Indexing high-dimensional data for content-based retrieval in large databases

  • Author

    Fonseca, Manuel J. ; Jorge, Joaquim A.

  • Author_Institution
    Dept. of Inf. Syst. & Comput. Sci., Tech. Univ. Lisbon, Portugal
  • fYear
    2003
  • fDate
    26-28 March 2003
  • Firstpage
    267
  • Lastpage
    274
  • Abstract
    Many indexing approaches for high-dimensional data points have evolved into very complex and hard to code algorithms. Sometimes this complexity is not matched by increase in performance. Motivated by these ideas, we take a step back and look at simpler approaches to indexing multimedia data. In this paper we propose a simple, (not simplistic) yet efficient indexing structure for high-dimensional data Points of variable dimension, using dimension reduction. Our approach maps multidimensional points to a 1D line by computing their Euclidean Norm and use a B/sup +/-Tree to store data points. We exploit B/sup +/-Tree efficient sequential search to develop simple, yet performant methods to implement point, range and nearest-neighbor queries. To evaluate our technique we conducted a set of experiments, using both synthetic and real data. We analyze creation, insertion and query times as a function of data set size and dimension. Results so far show that our simple scheme outperforms current approaches, such as the Pyramid Technique, the A-Tree and the SR-Tree, for many data distributions. Moreover, our approach seems to scale better both with growing dimensionality and data set size, while exhibiting low insertion and search times.
  • Keywords
    content-based retrieval; database indexing; multimedia databases; tree data structures; very large databases; visual databases; A-Tree; B+ tree; Euclidean Norm; Pyramid Technique; SR-Tree; content-based retrieval; data set size; dimension reduction; experiments; high-dimensional data indexing; high-dimensional data points; large databases; multidimensional points; multimedia data; nearest-neighbor queries; performance; sequential search; Computer science; Content based retrieval; Data structures; Image databases; Indexing; Information retrieval; Information systems; Nearest neighbor searches; Spatial databases; Technical drawing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on
  • Conference_Location
    Kyoto, Japan
  • Print_ISBN
    0-7695-1895-8
  • Type

    conf

  • DOI
    10.1109/DASFAA.2003.1192391
  • Filename
    1192391