• DocumentCode
    967688
  • Title

    Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding

  • Author

    Chen, Lei ; Lian, Xiang

  • Author_Institution
    Hong Kong Univ. of Sci. & Technol., Hong Kong
  • Volume
    20
  • Issue
    3
  • fYear
    2008
  • fDate
    3/1/2008 12:00:00 AM
  • Firstpage
    321
  • Lastpage
    336
  • Abstract
    Similarity-based search has been a key factor for many applications such as multimedia retrieval, data mining, Web search and retrieval, and so on. There are two important issues related to the similarity search, namely, the design of a distance function to measure the similarity and improving the search efficiency. Many distance functions have been proposed, which attempt to closely mimic human recognition. Unfortunately, some of these well-designed distance functions do not follow the triangle inequality and are therefore nonmetric. As a consequence, efficient retrieval by using these nonmetric distance functions becomes more challenging, since most existing index structures assume that the indexed distance functions are metric. In this paper, we address this challenging problem by proposing an efficient method, that is, local constant embedding (LCE), which divides the data set into disjoint groups so that the triangle inequality holds within each group by constant shifting. Furthermore, we design a pivot selection approach for the converted metric distance and create an index structure to speed up the retrieval efficiency. Moreover, we also propose a novel method to answer approximate similarity search in the nonmetric space with a guaranteed query accuracy. Extensive experiments show that our method works well on various nonmetric distance functions and improves the retrieval efficiency by an order of magnitude compared to the linear scan and existing retrieval approaches with no false dismissals.
  • Keywords
    query formulation; distance function; local constant embedding; similarity-based search; Algorithm design and analysis; Character recognition; Data mining; Extraterrestrial measurements; Horses; Humans; Information retrieval; Jacobian matrices; Robustness; Web search; Multimedia databases; Query processing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.190700
  • Filename
    4378373