Title :
Distributed community detection in web-scale networks
Author :
Ovelgonne, Michael
Author_Institution :
UMIACS, Univ. of Maryland, College Park, MD, USA
Abstract :
Partitioning large networks into smaller sub-networks (communities) is an important tool to analyze the structure of complex linked systems. In recent years, many in-memory community detection algorithms have been proposed for graphs with millions of edges. Analyzing massive graphs with billions of edges is impossible for existing algorithms. In this contribution, we show how to find community partitions of networks with billions of edges. Our approach is based on an ensemble learning scheme for community detection that provides a way to identify high quality partitions from an ensemble of partitions with lower quality. We present a pre-processing procedure for community detection algorithms that significantly decreases the problem size. After reducing the problem size, traditional non-distributed community detection algorithms can be applied. We implemented a weak but highly scalable label propagation algorithm on top of the distributed-computing framework Apache Hadoop. The evaluation of our implementation on a 50-node Hadoop cluster and with evaluation datasets up to 3.3 billion edges shows very good results with respect to clustering quality as well as scalability. For a smaller 260 million edge network, we show that our preprocessing can improve the results of the popular Louvain modularity clustering algorithm.
Keywords :
Internet; distributed algorithms; learning (artificial intelligence); network theory (graphs); pattern clustering; 50-node Hadoop cluster; Apache Hadoop; Louvain modularity clustering algorithm; Web-scale networks; clustering quality; community partitions; complex linked systems; distributed community detection; distributed-computing framework; edge network; ensemble learning scheme; high quality partition identification; highly scalable label propagation algorithm; in-memory community detection algorithms; preprocessing procedure; problem size; Algorithm design and analysis; Clustering algorithms; Communities; Detection algorithms; Image edge detection; Partitioning algorithms; Vectors; Community Detection; Distributed Algorithms; Graph Clustering; MapReduce;
Conference_Titel :
Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on
Conference_Location :
Niagara Falls, ON