• DocumentCode
    610353
  • Title

    Efficient search algorithm for SimRank

  • Author

    Fujiwara, Yuichiro ; Nakatsuji, M. ; Shiokawa, H. ; Onizuka, M.

  • Author_Institution
    NTT Software Innovation Center, Tokyo, Japan
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    589
  • Lastpage
    600
  • Abstract
    Graphs are a fundamental data structure and have been employed to model objects as well as their relationships. The similarity of objects on the web (e.g., webpages, photos, music, micro-blogs, and social networking service users) is the key to identifying relevant objects in many recent applications. SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many applications such as web spam detection, collaborative tagging analysis, link prediction, and so on. SimRank computes similarities iteratively, and it needs O(N4T) time and O(N2) space for similarity computation where N and T are the number of nodes and iterations, respectively. Unfortunately, this iterative approach is computationally expensive. The goal of this work is to process top-k search and range search efficiently for a given node. Our solution, SimMat, is based on two ideas: (1) It computes the approximate similarity of a selected node pair efficiently in non-iterative style based on the Sylvester equation, and (2) It prunes unnecessary approximate similarity computations when searching for the high similarity nodes by exploiting estimations based on the Cauchy-Schwarz inequality. These two ideas reduce the time and space complexities of the proposed approach to O(Nn) where n is the target rank of the low-rank approximation (n ≪ N in practice). Our experiments show that our approach is much faster, by several orders of magnitude, than previous approaches in finding the high similarity nodes.
  • Keywords
    approximation theory; computational complexity; graph theory; search problems; Cauchy-Schwarz inequality; SimMat; SimRank; Sylvester equation; Web spam detection; Webpage; collaborative tagging analysis; data structure; graph; link prediction; low-rank approximation; microblog; music; noniterative style; object similarity; photo; range search; search algorithm; similarity computation; similarity node; similarity score; social networking service user; space complexity; time complexity; top-k search; Approximation methods; Eigenvalues and eigenfunctions; Equations; Iterative methods; Mathematical model; Social network services; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544858
  • Filename
    6544858