Title :
Metric Index: An Efficient and Scalable Solution for Similarity Search
Author :
Novak, David ; Batko, Michal
Author_Institution :
Masaryk Univ., Brno, Czech Republic
Abstract :
Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called metric index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches - the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows.
Keywords :
indexing; information filtering; tree data structures; B+-tree; approximation algorithm; distributed storage; information filtering; mapping mechanism; metric data management; metric index; metric space partitioning; metric space pruning; nontext information retrieval; novel indexing; similarity search; Delay; Digital images; Extraterrestrial measurements; Filtering; Heart; Image databases; Indexing; Information retrieval; MPEG 7 Standard; Testing; approximation; data structure; metric space; scalability; similarity search;
Conference_Titel :
Similarity Search and Applications, 2009. SISAP '09. Second International Workshop on
Conference_Location :
Prague
Print_ISBN :
978-0-7695-3765-8
DOI :
10.1109/SISAP.2009.26