Title :
Improvement and Research of FP-Growth Algorithm Based on Distributed Spark
Author :
Lingling Deng;Yuansheng Lou
Author_Institution :
Coll. of Comput. &
Abstract :
FP-growth algorithm as the representatives of non-pruning algorithms is widely used in mining transaction datasets. But it is sensitive to the calculation and the scale of datasets. When building FP-tree, the search operation as the major time-consuming operation has a higher complexity. And when the horizontal or vertical dimension of data set is larger, the mining efficiency will be reduced or even failed. To solve the above problems, reducing the complexity of search time and applying distributed computing are the widely used strategies. This paper presents a distributed SPFP algorithm based on Spark framework and improved FP-growth algorithm. The results of tests show that, compared to the PFP algorithm based on MapReduce, the OPFP algorithm based on Spark and original FP-growth algorithm, SPFP has high efficiency, cluster and flexibility.
Keywords :
"Algorithm design and analysis","Clustering algorithms","Data mining","Time complexity","Sparks","Itemsets","Data structures"
Conference_Titel :
Cloud Computing and Big Data (CCBD), 2015 International Conference on
DOI :
10.1109/CCBD.2015.15