DocumentCode
1031
Title
Clustering Large Probabilistic Graphs
Author
Kollios, George ; Potamias, Michalis ; Terzi, Evimaria
Author_Institution
Comput. Sci. Dept., Boston Univ., Boston, MA, USA
Volume
25
Issue
2
fYear
2013
fDate
Feb. 2013
Firstpage
325
Lastpage
336
Abstract
We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.
Keywords
approximation theory; graph theory; pattern clustering; random processes; social networking (online); Yahoo!; affiliation networks; approximation algorithms; correlation clustering; edit-distance-based definition; ground-truth data; noisy clusterings; objective function; probabilistic graph clustering; probabilistic protein-protein interaction networks; social networks; Approximation algorithms; Approximation methods; Clustering algorithms; Data mining; Partitioning algorithms; Probabilistic logic; Uncertainty; Uncertain data; clustering algorithms; probabilistic databases; probabilistic graphs;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2011.243
Filename
6095551
Link To Document