• DocumentCode
    472417
  • Title

    Effective Pruning Strategies for Sequential Pattern Mining

  • Author

    Yusheng, Xu ; Zhixin, Ma ; Lian, Li ; Dillon, Tharam S.

  • Author_Institution
    Lanzhou Univ., Lanzhou
  • fYear
    2008
  • fDate
    23-24 Jan. 2008
  • Firstpage
    21
  • Lastpage
    24
  • Abstract
    In this paper, we systematically explore the search space of frequent sequence mining and present two novel pruning strategies, SEP (Sequence Extension Pruning) and IEP (Item Extension Pruning), which can be used in all Apriori-like sequence mining algorithms or lattice-theoretic approaches. With a little more memory overhead, proposed pruning strategies can prune invalidated search space and decrease the total cost of frequency counting effectively. For effectiveness testing reason, we optimize SPAM [2] and present the improved algorithm, SPAMSEPIEP, which uses SEP and IEP to prune the search space by sharing the frequent 2- sequences lists. A set of comprehensive performance experiments study shows that SPAMSEPIEP outperforms SPAM by a factor of 10 on small datasets and better than 30% to 50% on reasonably large dataset.
  • Keywords
    data mining; very large databases; frequent sequential pattern mining; item extension pruning; large database; search space; sequence extension pruning; Data mining; Databases; Electronic mail; Information science; Itemsets; Sequences; Space exploration; Space technology; Testing; Unsolicited electronic mail;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on
  • Conference_Location
    Adelaide, SA
  • Print_ISBN
    978-0-7695-3090-1
  • Type

    conf

  • DOI
    10.1109/WKDD.2008.22
  • Filename
    4470342