• DocumentCode
    659634
  • Title

    Super-sequence frequent pattern mining on sequential dataset

  • Author

    Xinran Yu ; Korkmaz, Turgay

  • Author_Institution
    Comput. Sci. Dept., Univ. of Texas at San Antonio, San Antonio, TX, USA
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    52
  • Lastpage
    59
  • Abstract
    Due to the importance of Frequent Pattern Mining (FPM) in bioinformatics, web mining, social networks and so on, researchers have been paying significant attention to FPM and its various forms. In this study, we introduce a new form that we call super-sequence pattern mining. In contrast to frequent sub-sequence pattern mining studied significantly in the literature, frequent super-sequence mining requires to identify super-sequences that may contain sequential parts from different sequences and that have the total support larger than a given threshold. In essence, finding frequent super-sequence patterns turns out to be related to the well-known NP-hard longest path problem in graphs. Accordingly, we transform a given sequential dataset into a sequence graph and formulate the problem as k-hop longest path problem. We then propose a heuristic algorithm using dynamic programming techniques. The running time of our solution is depending on the number of different items in the sequence set but not on the size of the dataset. Through experiments, we demonstrate the effectiveness of the proposed solution. We also illustrate its use on an actual web log dataset and find out some interesting facts based on the identified frequent super-sequences on the web log dataset.
  • Keywords
    data mining; graph theory; graphs; pattern recognition; FPM; NP hard longest path problem; Web log dataset; Web mining; bioinformatics; dynamic programming; frequent subsequence pattern mining; frequent super sequence mining; frequent super sequences; graphs; heuristic algorithm; k-hop longest path problem; sequence graph; sequential dataset; social networks; super sequence frequent pattern mining; super sequence pattern mining; Complexity theory; Data mining; Databases; Dynamic programming; Heuristic algorithms; Silicon; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691783
  • Filename
    6691783