• DocumentCode
    244687
  • Title

    Scalable community detection from networks by computing edge betweenness on MapReduce

  • Author

    Seunghyeon Moon ; Jae-Gil Lee ; Minseo Kang

  • Author_Institution
    Dept. of Knowledge Service Eng., KAIST, Daejeon, South Korea
  • fYear
    2014
  • fDate
    15-17 Jan. 2014
  • Firstpage
    145
  • Lastpage
    148
  • Abstract
    Community detection from social network data gains much attention from academia and industry since it has many real-world applications. The Girvan-Newman (GN) algorithm is a divisive hierarchical clustering algorithm for community detection, which is regarded as one of the most popular algorithms. It exploits the concept of edge betweenness to divide a network into multiple communities. Though it is being widely used, it has limitations in supporting large-scale networks since it needs to calculate the shortest path between every pair of nodes in a network. In this paper, we develop a parallel version of the GN algorithm to support large-scale networks. To this end, we propose a new algorithm, which we call Shortest Path Betweenness MapReduce Algorithm (SPB-MRA), that utilizes the MapReduce model. This algorithm consists of four major stages, and all operations are executed in parallel. In addition, we suggest an approximation technique to further speed up community detection processes. We implemented SPB-MRA on Hadoop, which is the most popular open-source platform for MapReduce, and then conducted performance tests for SPB-MRA on Amazon EC2 instances. The results showed that elapsed time decreases almost linearly as the number of reducers increases and the approximation technique introduces negligible errors.
  • Keywords
    pattern clustering; social networking (online); Amazon EC2 instances; GN algorithm; Girvan-Newman algorithm; Hadoop; SPB-MRA; approximation technique; community detection processes; divisive hierarchical clustering algorithm; edge betweenness; large-scale networks; multiple communities; open-source platform; real-world applications; shortest path betweenness MapReduce algorithm; social network data; Algorithm design and analysis; Approximation algorithms; Approximation methods; Clustering algorithms; Communities; Image edge detection; Social network services; Girvan-Newman algorithm; Hadoop; MapReduce; SPB-MRA; community detection; edge betweenness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data and Smart Computing (BIGCOMP), 2014 International Conference on
  • Conference_Location
    Bangkok
  • Type

    conf

  • DOI
    10.1109/BIGCOMP.2014.6741425
  • Filename
    6741425