Title :
Scalable community detection from networks by computing edge betweenness on MapReduce
Author :
Seunghyeon Moon ; Jae-Gil Lee ; Minseo Kang
Author_Institution :
Dept. of Knowledge Service Eng., KAIST, Daejeon, South Korea
Abstract :
Community detection from social network data gains much attention from academia and industry since it has many real-world applications. The Girvan-Newman (GN) algorithm is a divisive hierarchical clustering algorithm for community detection, which is regarded as one of the most popular algorithms. It exploits the concept of edge betweenness to divide a network into multiple communities. Though it is being widely used, it has limitations in supporting large-scale networks since it needs to calculate the shortest path between every pair of nodes in a network. In this paper, we develop a parallel version of the GN algorithm to support large-scale networks. To this end, we propose a new algorithm, which we call Shortest Path Betweenness MapReduce Algorithm (SPB-MRA), that utilizes the MapReduce model. This algorithm consists of four major stages, and all operations are executed in parallel. In addition, we suggest an approximation technique to further speed up community detection processes. We implemented SPB-MRA on Hadoop, which is the most popular open-source platform for MapReduce, and then conducted performance tests for SPB-MRA on Amazon EC2 instances. The results showed that elapsed time decreases almost linearly as the number of reducers increases and the approximation technique introduces negligible errors.
Keywords :
pattern clustering; social networking (online); Amazon EC2 instances; GN algorithm; Girvan-Newman algorithm; Hadoop; SPB-MRA; approximation technique; community detection processes; divisive hierarchical clustering algorithm; edge betweenness; large-scale networks; multiple communities; open-source platform; real-world applications; shortest path betweenness MapReduce algorithm; social network data; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Communities; Image edge detection; Social network services; Girvan-Newman algorithm; Hadoop; MapReduce; SPB-MRA; community detection; edge betweenness;
Conference_Titel :
Big Data and Smart Computing (BIGCOMP), 2014 International Conference on
Conference_Location :
Bangkok
DOI :
10.1109/BIGCOMP.2014.6741425