• DocumentCode
    1958074
  • Title

    A distributed frequent itemset mining algorithm based on Spark

  • Author

    Feng Gui ; Yunlong Ma ; Feng Zhang ; Min Liu ; Fei Li ; Weiming Shen ; Hua Bai

  • Author_Institution
    Sch. of Electron. & Inf. Eng., Tongji Univ., Shanghai, China
  • fYear
    2015
  • fDate
    6-8 May 2015
  • Firstpage
    271
  • Lastpage
    275
  • Abstract
    Frequent itemset mining is an important step of association rules mining. Traditional frequent itemset mining algorithms have certain limitations. For example Apriori algorithm has to scan the input data repeatedly, which leads to high I/O load and low performance, and the FP-Growth algorithm is limited by the capacity of computer´s inner stores because it needs to build a FP-tree and mine frequent itemset on the basis of the FP-tree in memory. With the coming of the Big Data era, these limitations are becoming more prominent when confronted with mining large-scale data. In this paper, DPBM, a distributed matrix-based pruning algorithm based on Spark, is proposed to deal with frequent itemset mining. DPBM can greatly reduce the amount of candidate itemset by introducing a novel pruning technique for matrix-based frequent itemset mining algorithm, an improved Apriori algorithm which only needs to scan the input data once. In addition, each computer node reduces greatly the memory usage by implementing DPBM under a latest distributed environment-Spark, which is a lightning-fast distributed computing. The experimental results show that DPBM have better performance than MapReduce-based algorithms for frequent itemset mining in terms of speed and scalability.
  • Keywords
    data mining; input-output programs; matrix algebra; trees (mathematics); FP-growth algorithm; FP-tree; I/O load; Spark; apriori algorithm; association rules mining; distributed frequent itemset mining algorithm; distributed matrix-based pruning algorithm; MapReduce; Spark; distributed algorithm; frequent itemset mining; matrix-pruning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Supported Cooperative Work in Design (CSCWD), 2015 IEEE 19th International Conference on
  • Conference_Location
    Calabria
  • Print_ISBN
    978-1-4799-2001-3
  • Type

    conf

  • DOI
    10.1109/CSCWD.2015.7230970
  • Filename
    7230970