• DocumentCode
    2513871
  • Title

    Balanced parallel FP-Growth with MapReduce

  • Author

    Zhou, Le ; Zhong, Zhiyong ; Chang, Jin ; Li, Junjie ; Huang, Joshua Zhexue ; Feng, Shengzhong

  • Author_Institution
    Center for High Performance Comput., Chinese Acad. of Sci., Shenzhen, China
  • fYear
    2010
  • fDate
    28-30 Nov. 2010
  • Firstpage
    243
  • Lastpage
    246
  • Abstract
    Frequent itemset mining (FIM) plays an essential role in mining associations, correlations and many other important data mining tasks. Unfortunately, as the volume of dataset gets larger day by day, most of the FIM algorithms in literature become ineffective due to either too huge resource requirement or too much communication cost. In this paper, we propose a balanced parallel FP-Growth algorithm BPFP, based on the PFP algorithm [1], which parallelizes FP-Growth in the MapReduce approach. BPFP adds into PFP load balance feature, which improves parallelization and thereby improves performance. Through empirical study, BPFP outperformed the PFP which uses some simple grouping strategy.
  • Keywords
    data mining; distributed processing; BPFP; FIM algorithms; MapReduce; balanced parallel FP-growth algorithm; frequent itemset mining; Algorithm design and analysis; Clustering algorithms; Data mining; Estimation; Itemsets; Partitioning algorithms; Algorithms; Distributed computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-8883-4
  • Type

    conf

  • DOI
    10.1109/YCICT.2010.5713090
  • Filename
    5713090