DocumentCode
1761784
Title
Fast All-Pairs SimRank Assessment on Large Graphs and Bipartite Domains
Author
Weiren Yu ; Xuemin Lin ; Wenjie Zhang ; McCann, Julie A.
Author_Institution
Dept. of Comput., Imperial Coll. London, London, UK
Volume
27
Issue
7
fYear
2015
fDate
July 1 2015
Firstpage
1810
Lastpage
1823
Abstract
SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in O(Kmn) time on a graph of n vertices and m edges, for K iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy ε, the existing SimRank needs K = [log C alterations, where C is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, and an efficient algorithm is then devised to accelerate SimRank computation to O(Kd´n2) time, where d´ is typically much smaller than mn. (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from O(Kmn) to O(Km´n) time, where m´ (≤m) is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than m. Using real and synthetic data, we empirically verify that (1) our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude; (2) the revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores; (3) our finer-grained partial max memoization for the Minimax SimRank variation in bipartite domains is 5X-12X faster than the baselines.
Keywords
graph theory; matrix algebra; pattern clustering; SimRank equation; SimRank iterations; SimRank matrix; bipartite domain; duplicate computation elimination; edge clustering; fast all-pairs SimRank assessment; large graph; minimax SimRank variation; partial max clustering method; partial max memoization; partial sums memoization; transition matrices; vertex-pair similarity assessment; Acceleration; Accuracy; Computational modeling; Convergence; Damping; Optimization; Redundancy; SimRank; Structural similarity; hyperlink analysis;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2014.2339828
Filename
6857337
Link To Document