DocumentCode
659634
Title
Super-sequence frequent pattern mining on sequential dataset
Author
Xinran Yu ; Korkmaz, Turgay
Author_Institution
Comput. Sci. Dept., Univ. of Texas at San Antonio, San Antonio, TX, USA
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
52
Lastpage
59
Abstract
Due to the importance of Frequent Pattern Mining (FPM) in bioinformatics, web mining, social networks and so on, researchers have been paying significant attention to FPM and its various forms. In this study, we introduce a new form that we call super-sequence pattern mining. In contrast to frequent sub-sequence pattern mining studied significantly in the literature, frequent super-sequence mining requires to identify super-sequences that may contain sequential parts from different sequences and that have the total support larger than a given threshold. In essence, finding frequent super-sequence patterns turns out to be related to the well-known NP-hard longest path problem in graphs. Accordingly, we transform a given sequential dataset into a sequence graph and formulate the problem as k-hop longest path problem. We then propose a heuristic algorithm using dynamic programming techniques. The running time of our solution is depending on the number of different items in the sequence set but not on the size of the dataset. Through experiments, we demonstrate the effectiveness of the proposed solution. We also illustrate its use on an actual web log dataset and find out some interesting facts based on the identified frequent super-sequences on the web log dataset.
Keywords
data mining; graph theory; graphs; pattern recognition; FPM; NP hard longest path problem; Web log dataset; Web mining; bioinformatics; dynamic programming; frequent subsequence pattern mining; frequent super sequence mining; frequent super sequences; graphs; heuristic algorithm; k-hop longest path problem; sequence graph; sequential dataset; social networks; super sequence frequent pattern mining; super sequence pattern mining; Complexity theory; Data mining; Databases; Dynamic programming; Heuristic algorithms; Silicon; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691783
Filename
6691783
Link To Document