DocumentCode
3165181
Title
On Pattern Preserving Graph Generation
Author
Hong-Han Shuai ; De-Nian Yang ; Yu, Philip S. ; Chih-Ya Shen ; Ming-Syan Chen
Author_Institution
Nat. Taiwan Univ., Taipei, Taiwan
fYear
2013
fDate
7-10 Dec. 2013
Firstpage
677
Lastpage
686
Abstract
Real datasets always play an essential role in graph mining and analysis. However, nowadays most available real datasets only support millions of nodes. Therefore, the literature on Big Data analysis utilizes statistical graph generators to generate a massive graph (e.g., billions of nodes) for evaluating the scalability of an algorithm. Nevertheless, current popular statistical graph generators are properly designed to preserve only the statistical metrics, such as the degree distribution, diameter, and clustering coefficient of the original social graphs. Recently, the importance of frequent graph patterns has been recognized in the various works on graph mining, but unfortunately this crucial criterion has not been noticed in the existing graph generators. To address this important need, we make the first attempt to design a Pattern Preserving Graph Generation (PPGG) algorithm to generate a graph including all frequent patterns and three most popular statistical parameters: degree distribution, clustering coefficient, and average vertex degree. The experimental results show that PPGG, which we have released as a free download, is efficient and able to generate a billion-node graph in approximately 10 minutes, much faster than the existing graph generators.
Keywords
Big Data; data analysis; data mining; graph theory; pattern clustering; statistical analysis; Big Data analysis; PPGG algorithm; average vertex degree; clustering coefficient; degree distribution; frequent graph pattern; graph analysis; graph mining; pattern preserving graph generation; real datasets; statistical graph generators; statistical metrics; Algorithm design and analysis; Biology; Clustering algorithms; Data mining; Databases; Generators; Histograms; Algorithms; Graph Mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location
Dallas, TX
ISSN
1550-4786
Type
conf
DOI
10.1109/ICDM.2013.14
Filename
6729552
Link To Document