• DocumentCode
    3576337
  • Title

    Semi-randomized hashing for large scale data retrieval

  • Author

    Haichuan Yang ; Xiao Bai ; Jun Zhou ; Peng Ren ; Jian Cheng ; Lu Bai

  • Author_Institution
    Sch. of Comput. Since & Eng., Beihang Univ., Beijing, China
  • fYear
    2014
  • Firstpage
    53
  • Lastpage
    58
  • Abstract
    In information retrieval, efficient accomplishing the nearest neighbor search on large scale database is a great challenge. Hashing based indexing methods represent each data instance as a binary string to retrieve the approximate nearest neighbors. In this paper, we present a semi-randomized hashing approach to preserve the Euclidean distance by binary codes. Euclidean distance preserving is a classic research problem in hashing. Most hashing methods used purely randomized or optimized learning strategy to achieve this goal. Our method, on the other hand, combines both randomized and optimized strategies. It starts from generating multiple random vectors, and then approximates them by a single projection vector. In the quantization step, it uses the orthogonal transformation to minimize an upper bound of the deviation between real-valued vectors and binary codes. The proposed method overcomes the problem that randomized hash functions are isolated from the data distribution. What´s more, our method supports an arbitrary number of hash functions, which is beneficial in building better hashing methods. The experiments show that our approach outperforms the alternative state-of-the-art methods for retrieval on the large scale dataset.
  • Keywords
    binary codes; file organisation; indexing; information retrieval; vectors; Euclidean distance preserving; approximate nearest neighbors; binary codes; binary string; data distribution; database; hashing based indexing methods; information retrieval; large scale data retrieval; nearest neighbor search; optimized learning strategy; orthogonal transformation; quantization step; randomized hash functions; randomized learning strategy; real-valued vectors; semirandomized hashing; single projection vector; Binary codes; Equations; Euclidean distance; Principal component analysis; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Science and Advanced Analytics (DSAA), 2014 International Conference on
  • Type

    conf

  • DOI
    10.1109/DSAA.2014.7058051
  • Filename
    7058051