Scalable algorithm for mining maximal frequent itemsets

Author

Li, Qing-hua ; Wang, Hui ; He, Ye ; Jiang, Sheng-yi

Author_Institution

Sch. of Comput., Huazhong Univ. of Sci. & Technol., Wuhan, China

Volume

1

fYear

2003

fDate

2-5 Nov. 2003

Firstpage

143

Abstract

The discovery of frequent itemsets is a very computational and I/O intensive task, and beyond a certain database size, it is crucial to leverage and the combined computational power of multiple processors for fast response and scalability. In this paper we present new scalable algorithm for maximal frequent itemset mining. It decomposes the search space by prefix-based equivalence classes, distributes work among the processors and selectively duplicates databases in such a way that each processor can compute the maximal frequent itemsets independently. It utilizes multiple level backtrack pruning strategy, along with vertical database format, counting frequency by simple tid-list intersection operation. These techniques eliminate the need for synchronization, drastically cutting down the communication cost. The analysis and experimental results demonstrate that our approach is well scalable in speedup and sizeup.

Keywords

data mining; database management systems; equivalence classes; database duplication; decomposition technique; independent classes; maximal frequent itemsets mining; multiple level backtrack pruning strategy; multiple processors; parallel mining; prefix-based equivalence classes; scalable algorithm; search space; tid-list intersection operation; vertical database format; Algorithm design and analysis; Costs; Data mining; Databases; Distributed computing; Educational institutions; Frequency synchronization; Itemsets; Partitioning algorithms; Scalability;

fLanguage

English

Publisher

ieee

Conference_Titel

Machine Learning and Cybernetics, 2003 International Conference on

Print_ISBN

0-7803-8131-9

Type

conf

DOI

10.1109/ICMLC.2003.1264459

Filename

1264459