DocumentCode :
1791680
Title :
FS3: A sampling based method for top-k frequent subgraph mining
Author :
Saha, Tapan K. ; Al Hasan, Mohammad
Author_Institution :
Dept. of Comput. & Inf. Sci., Indiana Univ.-Purdue Univ. Indianapolis, Indianapolis, IN, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
72
Lastpage :
79
Abstract :
Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.
Keywords :
Markov processes; Monte Carlo methods; data mining; graph theory; queueing theory; sampling methods; FS3; MCMC; Markov chain Monte Carlo sampling; data mining; finite queue; fixed-size subgraphs; innovative queue manager; labeled subgraph mining; probabilistic sense; sampling based method; scientific domains; subgraph isomorphism task; top-k frequent subgraph mining; Data mining; Databases; Markov processes; Proposals; Scalability; Silicon; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004359
Filename :
7004359
Link To Document :
بازگشت