• DocumentCode
    651540
  • Title

    Compressing Locality Sensitive Hashing Tables

  • Author

    Santoyo, Francisco ; Chavez, E. ; Tellez, Eric S.

  • Author_Institution
    Div. de Estudios de Posgrado de la Fac. de Ing. Electr., Univ. Michoacana de San Nicolas de Hidalgo, Hidalgo, Mexico
  • fYear
    2013
  • fDate
    Oct. 30 2013-Nov. 1 2013
  • Firstpage
    41
  • Lastpage
    46
  • Abstract
    LSH is the industry standard for proximity searching tasks on collections of data having coordinates. An LSH index applies a set of hashing functions to the representation of an object to identify proximal objects to a query, leaving distal objects apart. In other words, objects with the same hash will be mutually proximal with high probability. LSH is very fast and gives probabilistic guarantees on the quality of the results. On the other hand, mobile applications using proximity queries are becoming common place. Feature extraction can be done in a smart phone. However, the actual query rely on a wireless link because memory is a scarce resource. To tackle the above problem, we present in this paper a method to compress the LSH index while still being able to query without decompressing. The query speed is practically the same, and can even be faster. We derive a lower bound on the memory requirements for the compress representation and present an implementation using close to optimal storage. We provide an extensive experimental comparison of our compressed representation against the uncompressed one over a large database of 55 million objects. We obtained a compression ratio ranging from 70% to 80% without slowing down, in practice, the search speed.
  • Keywords
    cryptography; data acquisition; data compression; probability; query formulation; storage management; LSH index; close-to-optimal storage; compress representation; compression ratio; data collections; distal objects; feature extraction; hashing functions; industry standard; locality sensitive hashing tables; memory requirements; mobile applications; object representation; probabilistic guarantees; proximal objects identification; proximity queries; proximity searching tasks; query speed; smart phone; wireless link; Approximation algorithms; Electronic mail; Indexes; Measurement; Memory management; Probabilistic logic; Locality Sensitive Hashing; Succinct Proximity Search Indexes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science (ENC), 2013 Mexican International Conference on
  • Conference_Location
    Morelia
  • ISSN
    1550-4069
  • Type

    conf

  • DOI
    10.1109/ENC.2013.12
  • Filename
    6679818