• DocumentCode
    2983845
  • Title

    Direct Discovery of High Utility Itemsets without Candidate Generation

  • Author

    Junqiang Liu ; Ke Wang ; Fung, Benjamin C. M.

  • Author_Institution
    Inf. & Electron. Eng., Zhejiang Gongshang Univ., Hangzhou, China
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    984
  • Lastpage
    989
  • Abstract
    Utility mining emerged recently to address the limitation of frequent itemset mining by introducing interestingness measures that reflect both the statistical significance and the user´s expectation. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotone property holds with the interestingness measure. The state-of-the-art works on this problem all employ a two-phase, candidate generation approach, which suffers from the scalability issue due to the huge number of candidates. This paper proposes a high utility itemset growth approach that works in a single phase without generating candidates. Our basic approach is to enumerate itemsets by prefix extensions, to prune search space by utility upper bounding, and to maintain original utility information in the mining process by a novel data structure. Such a data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility itemsets in an efficient and scalable way. We further enhance the efficiency significantly by introducing recursive irrelevant item filtering with sparse data, and a lookahead strategy with dense data. Extensive experiments on sparse and dense, synthetic and real data suggest that our algorithm outperforms the state-of-the-art algorithms over one order of magnitude.
  • Keywords
    data mining; information filtering; candidate generation approach; dense data; frequent itemset mining; high utility itemset discovery; high utility itemset growth approach; interestingness measure; lookahead strategy; prefix extension; recursive irrelevant item filtering; sparse data; statistical significance; user expectation; utility mining; utility upper bounding; Data mining; Educational institutions; Electronic mail; Itemsets; Scalability; Upper bound; Utility mining; frequent itemsets; high utility itemsets; pattern mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.20
  • Filename
    6413821