• DocumentCode
    1679405
  • Title

    Sequential Pattern Mining with Wildcards

  • Author

    Xie, Fei ; Wu, Xindong ; Hu, Xuegang ; Gao, Jun ; Guo, Dan ; Fei, Yulian ; Hua, Ertian

  • Author_Institution
    Coll. of Comput. Sci. & Info. Eng., Hefei Univ. of Tech., Hefei, China
  • Volume
    1
  • fYear
    2010
  • Firstpage
    241
  • Lastpage
    247
  • Abstract
    Sequential pattern mining is an important research task in many domains, such as biological science. In this paper, we study the problem of mining frequent patterns from sequences with wildcards. The user can specify the gap constraints with flexibility. Given a subject sequence, a minimal support threshold and a gap constraint, we aim to find frequent patterns whose supports in the sequence are no less than the given support threshold. We design an efficient mining algorithm MAIL that utilizes the candidate occurrences of the prefix to compute the support of a pattern that avoids the rescanning of the sequence. We present two pruning strategies to improve the completeness and the time efficiency of MAIL. Experiments show that MAIL mines 2 times more patterns than one of its peers and the time performance is 12 times faster on average than its another peer.
  • Keywords
    data mining; MAIL; biological science; efficient mining algorithm; frequent patterns; sequential pattern mining; wildcards; Algorithm design and analysis; Bioinformatics; Complexity theory; DNA; Genomics; Pattern matching; Postal services; candidate occurrence pruning; one-off condition; sequential pattern mining; wildcard;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on
  • Conference_Location
    Arras
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4244-8817-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2010.42
  • Filename
    5670041