• DocumentCode
    3342042
  • Title

    A parallel computing model for large-graph mining with MapReduce

  • Author

    Bin Wu ; Yuxiao Dong ; Qing Ke ; Yanan Cai

  • Author_Institution
    Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
  • Volume
    1
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    43
  • Lastpage
    47
  • Abstract
    How can we quickly find the structures and characters of a large-scale graph? Large-scale graph exists everywhere, such as CALL graph, the World Wide Web, Facebook networks and many more. The continued exponential growth in both the size and complexity of the graphs is giving birth to a new challenge to the analysts and researchers. With respect to these challenges, a new class of algorithms and computing models is needed urgently for the large-scale graphs. An excellent promising clue for dealing with graphs with great sizes is the emerging MapReduce framework and its open-source implementation, Hadoop. The problem of 3-clique enumeration of a graph is an important operation that can help structure mining and a difficult mission for graphs with great sizes on the single computer. In this paper, we propose a parallel computing model for 3-clique enumeration based on cluster system with the help of MapReduce for large-scale graphs. The process of enumeration is firstly to extract one-leap information of the graph, then the two-leap information and finally, the key-based 3-clique enumeration. Also, we apply the computing model to the computation of clustering coefficient. More than anything else, the computing model is applied to three real-world large CALL graphs and the results of the experiments manifest the good scalability and efficiency of the model.
  • Keywords
    data mining; graph theory; parallel processing; CALL graph; Facebook networks; Hadoop; MapReduce framework; World Wide Web; cluster system; clustering coefficient; key-based 3-clique enumeration; large-graph mining; large-scale graph; one-leap information; parallel computing model; two-leap information; Clustering algorithms; Computational modeling; Data mining; Distributed databases; Parallel processing; Scalability; Social network services; 3-clique; MapReduce; clustering coefficient; graph mining; social network analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation (ICNC), 2011 Seventh International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    2157-9555
  • Print_ISBN
    978-1-4244-9950-2
  • Type

    conf

  • DOI
    10.1109/ICNC.2011.6022061
  • Filename
    6022061