• DocumentCode
    2377990
  • Title

    The Apriori property of sequence pattern mining with wildcard gaps

  • Author

    Min, Fan ; Wu, Youxi ; Wu, Xindong

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Zhangzhou Normal Univ., Zhangzhou, China
  • fYear
    2010
  • fDate
    18-18 Dec. 2010
  • Firstpage
    138
  • Lastpage
    143
  • Abstract
    In biological sequence analysis, long and frequently occurring patterns tend to be interesting. Data miners designed pattern growth algorithms to obtain frequent patterns with periodical wildcard gaps, where the pattern frequency is defined as the number of pattern occurrences divided by the number of offset sequences. However, the existing definition set does not facilitate further research works. First, some extremely frequent patterns are obviously uninteresting. Second, the Apriori property does not hold; consequently, state-of-the art algorithms are all Apriori-like and rather complex. In this paper, we propose an alternative definition of the number of offset sequences by adding a number of dummy characters at the tail of sequence. With the new definition, these uninteresting patterns are no longer frequent, and the Apriori property holds, hence our Apriori algorithm can mine all frequent patterns with minimal endeavor. Moreover, the computation of the number of offset sequences becomes straightforward. Experiments with a DNA sequence indicate 1) the pattern frequencies under two definition sets have little difference, therefore it is reasonable to replace the existing one with the new one in practice, and 2) our algorithm runs less rounds than the best case of MMP which is based on the existing definition set.
  • Keywords
    DNA; bioinformatics; data analysis; data mining; molecular biophysics; pattern classification; Apriori algorithm; DNA sequence; apriori property; datasets; sequence pattern mining; Apriori; Sequence pattern mining; frequency; wildcard gap;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
  • Conference_Location
    Hong, Kong
  • Print_ISBN
    978-1-4244-8303-7
  • Electronic_ISBN
    978-1-4244-8304-4
  • Type

    conf

  • DOI
    10.1109/BIBMW.2010.5703787
  • Filename
    5703787