DocumentCode
2983845
Title
Direct Discovery of High Utility Itemsets without Candidate Generation
Author
Junqiang Liu ; Ke Wang ; Fung, Benjamin C. M.
Author_Institution
Inf. & Electron. Eng., Zhejiang Gongshang Univ., Hangzhou, China
fYear
2012
fDate
10-13 Dec. 2012
Firstpage
984
Lastpage
989
Abstract
Utility mining emerged recently to address the limitation of frequent itemset mining by introducing interestingness measures that reflect both the statistical significance and the user´s expectation. Among utility mining problems, utility mining with the itemset share framework is a hard one as no anti-monotone property holds with the interestingness measure. The state-of-the-art works on this problem all employ a two-phase, candidate generation approach, which suffers from the scalability issue due to the huge number of candidates. This paper proposes a high utility itemset growth approach that works in a single phase without generating candidates. Our basic approach is to enumerate itemsets by prefix extensions, to prune search space by utility upper bounding, and to maintain original utility information in the mining process by a novel data structure. Such a data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility itemsets in an efficient and scalable way. We further enhance the efficiency significantly by introducing recursive irrelevant item filtering with sparse data, and a lookahead strategy with dense data. Extensive experiments on sparse and dense, synthetic and real data suggest that our algorithm outperforms the state-of-the-art algorithms over one order of magnitude.
Keywords
data mining; information filtering; candidate generation approach; dense data; frequent itemset mining; high utility itemset discovery; high utility itemset growth approach; interestingness measure; lookahead strategy; prefix extension; recursive irrelevant item filtering; sparse data; statistical significance; user expectation; utility mining; utility upper bounding; Data mining; Educational institutions; Electronic mail; Itemsets; Scalability; Upper bound; Utility mining; frequent itemsets; high utility itemsets; pattern mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
ISSN
1550-4786
Print_ISBN
978-1-4673-4649-8
Type
conf
DOI
10.1109/ICDM.2012.20
Filename
6413821
Link To Document