A parallel computing model for large-graph mining with MapReduce

Author

Bin Wu ; Yuxiao Dong ; Qing Ke ; Yanan Cai

Author_Institution

Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China

Volume

1

fYear

2011

fDate

26-28 July 2011

Firstpage

43

Lastpage

47

Abstract

How can we quickly find the structures and characters of a large-scale graph? Large-scale graph exists everywhere, such as CALL graph, the World Wide Web, Facebook networks and many more. The continued exponential growth in both the size and complexity of the graphs is giving birth to a new challenge to the analysts and researchers. With respect to these challenges, a new class of algorithms and computing models is needed urgently for the large-scale graphs. An excellent promising clue for dealing with graphs with great sizes is the emerging MapReduce framework and its open-source implementation, Hadoop. The problem of 3-clique enumeration of a graph is an important operation that can help structure mining and a difficult mission for graphs with great sizes on the single computer. In this paper, we propose a parallel computing model for 3-clique enumeration based on cluster system with the help of MapReduce for large-scale graphs. The process of enumeration is firstly to extract one-leap information of the graph, then the two-leap information and finally, the key-based 3-clique enumeration. Also, we apply the computing model to the computation of clustering coefficient. More than anything else, the computing model is applied to three real-world large CALL graphs and the results of the experiments manifest the good scalability and efficiency of the model.

Keywords

data mining; graph theory; parallel processing; CALL graph; Facebook networks; Hadoop; MapReduce framework; World Wide Web; cluster system; clustering coefficient; key-based 3-clique enumeration; large-graph mining; large-scale graph; one-leap information; parallel computing model; two-leap information; Clustering algorithms; Computational modeling; Data mining; Distributed databases; Parallel processing; Scalability; Social network services; 3-clique; MapReduce; clustering coefficient; graph mining; social network analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Natural Computation (ICNC), 2011 Seventh International Conference on

Conference_Location

Shanghai

ISSN

2157-9555

Print_ISBN

978-1-4244-9950-2

Type

conf

DOI

10.1109/ICNC.2011.6022061

Filename

6022061