DocumentCode
1958074
Title
A distributed frequent itemset mining algorithm based on Spark
Author
Feng Gui ; Yunlong Ma ; Feng Zhang ; Min Liu ; Fei Li ; Weiming Shen ; Hua Bai
Author_Institution
Sch. of Electron. & Inf. Eng., Tongji Univ., Shanghai, China
fYear
2015
fDate
6-8 May 2015
Firstpage
271
Lastpage
275
Abstract
Frequent itemset mining is an important step of association rules mining. Traditional frequent itemset mining algorithms have certain limitations. For example Apriori algorithm has to scan the input data repeatedly, which leads to high I/O load and low performance, and the FP-Growth algorithm is limited by the capacity of computer´s inner stores because it needs to build a FP-tree and mine frequent itemset on the basis of the FP-tree in memory. With the coming of the Big Data era, these limitations are becoming more prominent when confronted with mining large-scale data. In this paper, DPBM, a distributed matrix-based pruning algorithm based on Spark, is proposed to deal with frequent itemset mining. DPBM can greatly reduce the amount of candidate itemset by introducing a novel pruning technique for matrix-based frequent itemset mining algorithm, an improved Apriori algorithm which only needs to scan the input data once. In addition, each computer node reduces greatly the memory usage by implementing DPBM under a latest distributed environment-Spark, which is a lightning-fast distributed computing. The experimental results show that DPBM have better performance than MapReduce-based algorithms for frequent itemset mining in terms of speed and scalability.
Keywords
data mining; input-output programs; matrix algebra; trees (mathematics); FP-growth algorithm; FP-tree; I/O load; Spark; apriori algorithm; association rules mining; distributed frequent itemset mining algorithm; distributed matrix-based pruning algorithm; MapReduce; Spark; distributed algorithm; frequent itemset mining; matrix-pruning;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Supported Cooperative Work in Design (CSCWD), 2015 IEEE 19th International Conference on
Conference_Location
Calabria
Print_ISBN
978-1-4799-2001-3
Type
conf
DOI
10.1109/CSCWD.2015.7230970
Filename
7230970
Link To Document