DocumentCode :
659634
Title :
Super-sequence frequent pattern mining on sequential dataset
Author :
Xinran Yu ; Korkmaz, Turgay
Author_Institution :
Comput. Sci. Dept., Univ. of Texas at San Antonio, San Antonio, TX, USA
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
52
Lastpage :
59
Abstract :
Due to the importance of Frequent Pattern Mining (FPM) in bioinformatics, web mining, social networks and so on, researchers have been paying significant attention to FPM and its various forms. In this study, we introduce a new form that we call super-sequence pattern mining. In contrast to frequent sub-sequence pattern mining studied significantly in the literature, frequent super-sequence mining requires to identify super-sequences that may contain sequential parts from different sequences and that have the total support larger than a given threshold. In essence, finding frequent super-sequence patterns turns out to be related to the well-known NP-hard longest path problem in graphs. Accordingly, we transform a given sequential dataset into a sequence graph and formulate the problem as k-hop longest path problem. We then propose a heuristic algorithm using dynamic programming techniques. The running time of our solution is depending on the number of different items in the sequence set but not on the size of the dataset. Through experiments, we demonstrate the effectiveness of the proposed solution. We also illustrate its use on an actual web log dataset and find out some interesting facts based on the identified frequent super-sequences on the web log dataset.
Keywords :
data mining; graph theory; graphs; pattern recognition; FPM; NP hard longest path problem; Web log dataset; Web mining; bioinformatics; dynamic programming; frequent subsequence pattern mining; frequent super sequence mining; frequent super sequences; graphs; heuristic algorithm; k-hop longest path problem; sequence graph; sequential dataset; social networks; super sequence frequent pattern mining; super sequence pattern mining; Complexity theory; Data mining; Databases; Dynamic programming; Heuristic algorithms; Silicon; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691783
Filename :
6691783
Link To Document :
بازگشت