• Title of article

    Large-scale similarity data management with distributed Metric Index

  • Author/Authors

    David Novak، نويسنده , , Michal Batko، نويسنده , , Pavel Zezula، نويسنده ,

  • Issue Information
    دوماهنامه با شماره پیاپی سال 2012
  • Pages
    18
  • From page
    855
  • To page
    872
  • Abstract
    Metric space is a universal and versatile model of similarity that can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. In this work, we try to make an important step towards such management system that would be able to scale to data collections of billions of objects. We propose a distributed index structure for similarity data management called the Metric Index (M-Index) which can answer queries in precise and approximate manner. This technique can take advantage of any distributed hash table that supports interval queries and utilize it as an underlying index. We have performed numerous experiments to test various settings of the M-Index structure and we have proved its usability by developing a full-featured publicly-available Web application.
  • Keywords
    Similarity search , Performance tuning , scalability , Peer-to-peer structured networks , Distributed data structures , Metric space
  • Journal title
    Information Processing and Management
  • Serial Year
    2012
  • Journal title
    Information Processing and Management
  • Record number

    1229283