• DocumentCode
    416103
  • Title

    An efficient algorithm for mining frequent sequences by a new strategy without support counting

  • Author

    Chiu, Ding-Ying ; Wu, Yi-Hung ; Chen, Arbee L P

  • Author_Institution
    Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan
  • fYear
    2004
  • fDate
    30 March-2 April 2004
  • Firstpage
    375
  • Lastpage
    386
  • Abstract
    Mining sequential patterns in large databases is an important research topic. The main challenge of mining sequential patterns is the high processing cost due to the large amount of data. We propose a new strategy called direct sequence comparison (abbreviated as DISC), which can find frequent sequences without having to compute the support counts of nonfrequent sequences. The main difference between the DISC strategy and the previous works is the way to prune nonfrequent sequences. The previous works are based on the antimonotone property, which prune the nonfrequent sequences according to the frequent sequences with shorter lengths. On the contrary, the DISC strategy prunes the nonfrequent sequences according to the other sequences with the same length. Moreover, we summarize three strategies used in the previous works and design an efficient algorithm called DISC-all to take advantages of all the four strategies. The experimental results show that the DISC-all algorithm outperforms the PrefixSpan algorithm on mining frequent sequences in large databases. In addition, we analyze these strategies to design the dynamic version of our algorithm, which achieves a much better performance.
  • Keywords
    data mining; very large databases; PrefixSpan algorithm; direct sequence comparison; nonfrequent sequence pruning; sequential pattern mining; very large database; Algorithm design and analysis; Computer science; Costs; Data analysis; Data mining; Itemsets; Performance analysis; Transaction databases; Unsolicited electronic mail; Working environment noise;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2004. Proceedings. 20th International Conference on
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-2065-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2004.1320012
  • Filename
    1320012