• DocumentCode
    2335103
  • Title

    Mining mutually dependent patterns

  • Author

    Ma, Sheng ; Hellerstein, Joseph L.

  • Author_Institution
    IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    409
  • Lastpage
    416
  • Abstract
    In some domains, such as isolating problems in computer networks and discovering stock market irregularities, there is more interest in patterns consisting of infrequent, but highly correlated items rather than patterns that occur frequently (as defined by minsup, the minimum support level). We describe the m-pattern, a new pattern that is defined in terms of minp, the minimum probability of mutual dependence of items in the pattern. We show that all infrequent m-patterns can be discovered by an efficient algorithm that makes use of: (a) a linear algorithm to qualify an m-pattern; (b) an effective technique for candidate pruning based on a necessary condition for the presence of an m-pattern; and (c) a level-wise search for m-pattern discovery (which is possible because m-patterns are downward closed). Further, we consider frequent m-patterns, which are defined in terms of both minp and minsup. Using synthetic data, we study the scalability of our algorithm. Then, we apply our algorithm to data from a production computer network both to show the m-patterns present and to contrast with frequent patterns. We show that when minp=0, our algorithm is equivalent to finding frequent patterns. However, with a larger minp, our algorithm yields a modest number of highly correlated items, which makes it possible to mine for infrequent but highly correlated itemsets. To date, many actionable m-patterns have been discovered in production systems
  • Keywords
    data mining; candidate pruning; frequent patterns; infrequent patterns; level-wise search; linear algorithm; m-pattern; minimum probability; minp; minsup; mutually dependent pattern mining; production computer network; scalability; Application software; Association rules; Computer networks; Data mining; Intrusion detection; Itemsets; Pattern analysis; Production; Scalability; Stock markets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
  • Conference_Location
    San Jose, CA
  • Print_ISBN
    0-7695-1119-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2001.989546
  • Filename
    989546