DocumentCode
244687
Title
Scalable community detection from networks by computing edge betweenness on MapReduce
Author
Seunghyeon Moon ; Jae-Gil Lee ; Minseo Kang
Author_Institution
Dept. of Knowledge Service Eng., KAIST, Daejeon, South Korea
fYear
2014
fDate
15-17 Jan. 2014
Firstpage
145
Lastpage
148
Abstract
Community detection from social network data gains much attention from academia and industry since it has many real-world applications. The Girvan-Newman (GN) algorithm is a divisive hierarchical clustering algorithm for community detection, which is regarded as one of the most popular algorithms. It exploits the concept of edge betweenness to divide a network into multiple communities. Though it is being widely used, it has limitations in supporting large-scale networks since it needs to calculate the shortest path between every pair of nodes in a network. In this paper, we develop a parallel version of the GN algorithm to support large-scale networks. To this end, we propose a new algorithm, which we call Shortest Path Betweenness MapReduce Algorithm (SPB-MRA), that utilizes the MapReduce model. This algorithm consists of four major stages, and all operations are executed in parallel. In addition, we suggest an approximation technique to further speed up community detection processes. We implemented SPB-MRA on Hadoop, which is the most popular open-source platform for MapReduce, and then conducted performance tests for SPB-MRA on Amazon EC2 instances. The results showed that elapsed time decreases almost linearly as the number of reducers increases and the approximation technique introduces negligible errors.
Keywords
pattern clustering; social networking (online); Amazon EC2 instances; GN algorithm; Girvan-Newman algorithm; Hadoop; SPB-MRA; approximation technique; community detection processes; divisive hierarchical clustering algorithm; edge betweenness; large-scale networks; multiple communities; open-source platform; real-world applications; shortest path betweenness MapReduce algorithm; social network data; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Communities; Image edge detection; Social network services; Girvan-Newman algorithm; Hadoop; MapReduce; SPB-MRA; community detection; edge betweenness;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data and Smart Computing (BIGCOMP), 2014 International Conference on
Conference_Location
Bangkok
Type
conf
DOI
10.1109/BIGCOMP.2014.6741425
Filename
6741425
Link To Document