Efficient search algorithm for SimRank

Author

Fujiwara, Yuichiro ; Nakatsuji, M. ; Shiokawa, H. ; Onizuka, M.

Author_Institution

NTT Software Innovation Center, Tokyo, Japan

fYear

2013

fDate

8-12 April 2013

Firstpage

589

Lastpage

600

Abstract

Graphs are a fundamental data structure and have been employed to model objects as well as their relationships. The similarity of objects on the web (e.g., webpages, photos, music, micro-blogs, and social networking service users) is the key to identifying relevant objects in many recent applications. SimRank, proposed by Jeh and Widom, provides a good similarity score and has been successfully used in many applications such as web spam detection, collaborative tagging analysis, link prediction, and so on. SimRank computes similarities iteratively, and it needs O(N⁴T) time and O(N²) space for similarity computation where N and T are the number of nodes and iterations, respectively. Unfortunately, this iterative approach is computationally expensive. The goal of this work is to process top-k search and range search efficiently for a given node. Our solution, SimMat, is based on two ideas: (1) It computes the approximate similarity of a selected node pair efficiently in non-iterative style based on the Sylvester equation, and (2) It prunes unnecessary approximate similarity computations when searching for the high similarity nodes by exploiting estimations based on the Cauchy-Schwarz inequality. These two ideas reduce the time and space complexities of the proposed approach to O(Nn) where n is the target rank of the low-rank approximation (n ≪ N in practice). Our experiments show that our approach is much faster, by several orders of magnitude, than previous approaches in finding the high similarity nodes.

Keywords

approximation theory; computational complexity; graph theory; search problems; Cauchy-Schwarz inequality; SimMat; SimRank; Sylvester equation; Web spam detection; Webpage; collaborative tagging analysis; data structure; graph; link prediction; low-rank approximation; microblog; music; noniterative style; object similarity; photo; range search; search algorithm; similarity computation; similarity node; similarity score; social networking service user; space complexity; time complexity; top-k search; Approximation methods; Eigenvalues and eigenfunctions; Equations; Iterative methods; Mathematical model; Social network services; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering (ICDE), 2013 IEEE 29th International Conference on

Conference_Location

Brisbane, QLD

ISSN

1063-6382

Print_ISBN

978-1-4673-4909-3

Electronic_ISBN

1063-6382

Type

conf

DOI

10.1109/ICDE.2013.6544858

Filename

6544858