DocumentCode
3342042
Title
A parallel computing model for large-graph mining with MapReduce
Author
Bin Wu ; Yuxiao Dong ; Qing Ke ; Yanan Cai
Author_Institution
Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
Volume
1
fYear
2011
fDate
26-28 July 2011
Firstpage
43
Lastpage
47
Abstract
How can we quickly find the structures and characters of a large-scale graph? Large-scale graph exists everywhere, such as CALL graph, the World Wide Web, Facebook networks and many more. The continued exponential growth in both the size and complexity of the graphs is giving birth to a new challenge to the analysts and researchers. With respect to these challenges, a new class of algorithms and computing models is needed urgently for the large-scale graphs. An excellent promising clue for dealing with graphs with great sizes is the emerging MapReduce framework and its open-source implementation, Hadoop. The problem of 3-clique enumeration of a graph is an important operation that can help structure mining and a difficult mission for graphs with great sizes on the single computer. In this paper, we propose a parallel computing model for 3-clique enumeration based on cluster system with the help of MapReduce for large-scale graphs. The process of enumeration is firstly to extract one-leap information of the graph, then the two-leap information and finally, the key-based 3-clique enumeration. Also, we apply the computing model to the computation of clustering coefficient. More than anything else, the computing model is applied to three real-world large CALL graphs and the results of the experiments manifest the good scalability and efficiency of the model.
Keywords
data mining; graph theory; parallel processing; CALL graph; Facebook networks; Hadoop; MapReduce framework; World Wide Web; cluster system; clustering coefficient; key-based 3-clique enumeration; large-graph mining; large-scale graph; one-leap information; parallel computing model; two-leap information; Clustering algorithms; Computational modeling; Data mining; Distributed databases; Parallel processing; Scalability; Social network services; 3-clique; MapReduce; clustering coefficient; graph mining; social network analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Computation (ICNC), 2011 Seventh International Conference on
Conference_Location
Shanghai
ISSN
2157-9555
Print_ISBN
978-1-4244-9950-2
Type
conf
DOI
10.1109/ICNC.2011.6022061
Filename
6022061
Link To Document