DocumentCode
402869
Title
Scalable algorithm for mining maximal frequent itemsets
Author
Li, Qing-hua ; Wang, Hui ; He, Ye ; Jiang, Sheng-yi
Author_Institution
Sch. of Comput., Huazhong Univ. of Sci. & Technol., Wuhan, China
Volume
1
fYear
2003
fDate
2-5 Nov. 2003
Firstpage
143
Abstract
The discovery of frequent itemsets is a very computational and I/O intensive task, and beyond a certain database size, it is crucial to leverage and the combined computational power of multiple processors for fast response and scalability. In this paper we present new scalable algorithm for maximal frequent itemset mining. It decomposes the search space by prefix-based equivalence classes, distributes work among the processors and selectively duplicates databases in such a way that each processor can compute the maximal frequent itemsets independently. It utilizes multiple level backtrack pruning strategy, along with vertical database format, counting frequency by simple tid-list intersection operation. These techniques eliminate the need for synchronization, drastically cutting down the communication cost. The analysis and experimental results demonstrate that our approach is well scalable in speedup and sizeup.
Keywords
data mining; database management systems; equivalence classes; database duplication; decomposition technique; independent classes; maximal frequent itemsets mining; multiple level backtrack pruning strategy; multiple processors; parallel mining; prefix-based equivalence classes; scalable algorithm; search space; tid-list intersection operation; vertical database format; Algorithm design and analysis; Costs; Data mining; Databases; Distributed computing; Educational institutions; Frequency synchronization; Itemsets; Partitioning algorithms; Scalability;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2003 International Conference on
Print_ISBN
0-7803-8131-9
Type
conf
DOI
10.1109/ICMLC.2003.1264459
Filename
1264459
Link To Document