DocumentCode
2513871
Title
Balanced parallel FP-Growth with MapReduce
Author
Zhou, Le ; Zhong, Zhiyong ; Chang, Jin ; Li, Junjie ; Huang, Joshua Zhexue ; Feng, Shengzhong
Author_Institution
Center for High Performance Comput., Chinese Acad. of Sci., Shenzhen, China
fYear
2010
fDate
28-30 Nov. 2010
Firstpage
243
Lastpage
246
Abstract
Frequent itemset mining (FIM) plays an essential role in mining associations, correlations and many other important data mining tasks. Unfortunately, as the volume of dataset gets larger day by day, most of the FIM algorithms in literature become ineffective due to either too huge resource requirement or too much communication cost. In this paper, we propose a balanced parallel FP-Growth algorithm BPFP, based on the PFP algorithm [1], which parallelizes FP-Growth in the MapReduce approach. BPFP adds into PFP load balance feature, which improves parallelization and thereby improves performance. Through empirical study, BPFP outperformed the PFP which uses some simple grouping strategy.
Keywords
data mining; distributed processing; BPFP; FIM algorithms; MapReduce; balanced parallel FP-growth algorithm; frequent itemset mining; Algorithm design and analysis; Clustering algorithms; Data mining; Estimation; Itemsets; Partitioning algorithms; Algorithms; Distributed computing;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-8883-4
Type
conf
DOI
10.1109/YCICT.2010.5713090
Filename
5713090
Link To Document