• DocumentCode
    3534637
  • Title

    Metric Index: An Efficient and Scalable Solution for Similarity Search

  • Author

    Novak, David ; Batko, Michal

  • Author_Institution
    Masaryk Univ., Brno, Czech Republic
  • fYear
    2009
  • fDate
    29-30 Aug. 2009
  • Firstpage
    65
  • Lastpage
    73
  • Abstract
    Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called metric index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches - the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows.
  • Keywords
    indexing; information filtering; tree data structures; B+-tree; approximation algorithm; distributed storage; information filtering; mapping mechanism; metric data management; metric index; metric space partitioning; metric space pruning; nontext information retrieval; novel indexing; similarity search; Delay; Digital images; Extraterrestrial measurements; Filtering; Heart; Image databases; Indexing; Information retrieval; MPEG 7 Standard; Testing; approximation; data structure; metric space; scalability; similarity search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Similarity Search and Applications, 2009. SISAP '09. Second International Workshop on
  • Conference_Location
    Prague
  • Print_ISBN
    978-0-7695-3765-8
  • Type

    conf

  • DOI
    10.1109/SISAP.2009.26
  • Filename
    5272384