• DocumentCode
    3724042
  • Title

    Diamond Sampling for Approximate Maximum All-Pairs Dot-Product (MAD) Search

  • Author

    Grey Ballard;Tamara G. Kolda;Ali Pinar;C. Seshadhri

  • Author_Institution
    Data Sci. &
  • fYear
    2015
  • Firstpage
    11
  • Lastpage
    20
  • Abstract
    Given two sets of vectors, A = {a1→, . . . , am→} and B = {b1→, . . . , bn→}, our problem is to find the top-t dot products, i.e., the largest |ai→ · bj→| among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach that avoids direct computation of all mn dot products. We select diamonds (i.e., four-cycles) from the weighted tripartite representation of A and B. The probability of selecting a diamond corresponding to pair (i, j) is proportional to (ai→ · bj→)2, amplifying the focus on the largest-magnitude entries. Experimental results indicate that diamond sampling is orders of magnitude faster than direct computation and requires far fewer samples than any competing approach. We also apply diamond sampling to the special case of maximum inner product search, and get significantly better results than the state-of-theart hashing methods.
  • Keywords
    "Diamonds","Indexes","Search problems","Manganese","Sparse matrices","Data mining","Collaboration"
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2015 IEEE International Conference on
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2015.46
  • Filename
    7373305