Title :
Clustering Large Probabilistic Graphs
Author :
Kollios, George ; Potamias, Michalis ; Terzi, Evimaria
Author_Institution :
Comput. Sci. Dept., Boston Univ., Boston, MA, USA
Abstract :
We study the problem of clustering probabilistic graphs. Similar to the problem of clustering standard graphs, probabilistic graph clustering has numerous applications, such as finding complexes in probabilistic protein-protein interaction (PPI) networks and discovering groups of users in affiliation networks. We extend the edit-distance-based definition of graph clustering to probabilistic graphs. We establish a connection between our objective function and correlation clustering to propose practical approximation algorithms for our problem. A benefit of our approach is that our objective function is parameter-free. Therefore, the number of clusters is part of the output. We also develop methods for testing the statistical significance of the output clustering and study the case of noisy clusterings. Using a real protein-protein interaction network and ground-truth data, we show that our methods discover the correct number of clusters and identify established protein relationships. Finally, we show the practicality of our techniques using a large social network of Yahoo! users consisting of one billion edges.
Keywords :
approximation theory; graph theory; pattern clustering; random processes; social networking (online); Yahoo!; affiliation networks; approximation algorithms; correlation clustering; edit-distance-based definition; ground-truth data; noisy clusterings; objective function; probabilistic graph clustering; probabilistic protein-protein interaction networks; social networks; Approximation algorithms; Approximation methods; Clustering algorithms; Data mining; Partitioning algorithms; Probabilistic logic; Uncertainty; Uncertain data; clustering algorithms; probabilistic databases; probabilistic graphs;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2011.243