• DocumentCode
    63950
  • Title

    Bridging Causal Relevance and Pattern Discriminability: Mining Emerging Patterns from High-Dimensional Data

  • Author

    Kui Yu ; Wei Ding ; Hao Wang ; Xindong Wu

  • Author_Institution
    Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
  • Volume
    25
  • Issue
    12
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    2721
  • Lastpage
    2739
  • Abstract
    It is a nontrivial task to build an accurate emerging pattern (EP) classifier from high-dimensional data because we inevitably face two challenges 1) how to efficiently extract a minimal set of strongly predictive EPs from an explosive number of candidate patterns, and 2) how to handle the highly sensitive choice of the minimal support threshold. To address these two challenges, we bridge causal relevance and EP discriminability (the predictive ability of emerging patterns) to facilitate EP mining and propose a new framework of mining EPs from high-dimensional data. In this framework, we study the relationships between causal relevance in a causal Bayesian network and EP discriminability in EP mining, and then reduce the pattern space of EP mining to direct causes and direct effects, or the Markov blanket (MB) of the class attribute in a causal Bayesian network. The proposed framework is instantiated by two EPs-based classifiers, CE-EP and MB-EP, where CE stands for direct Causes and direct Effects, and MB for Markov Blanket. Extensive experiments on a broad range of data sets validate the effectiveness of the CE-EP and MB-EP classifiers against other well-established methods, in terms of predictive accuracy, pattern numbers, running time, and sensitivity analysis.
  • Keywords
    Markov processes; belief networks; data mining; pattern classification; CE-EP; EP discriminability; EP mining; MB; MB-EP; Markov blanket; causal Bayesian network; causal relevance; direct causes; direct effects; emerging pattern classifier; emerging pattern mining; high-dimensional data; minimal support threshold; pattern discriminability; Association rules; Bayesian methods; Data mining; Itemsets; Markov processes; Pattern recognition; EP discriminability; Emerging patterns; causal Bayesian networks; causal relevance;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.218
  • Filename
    6341731