• DocumentCode
    2520705
  • Title

    An efficient parallel FP-Growth algorithm

  • Author

    Chen, Min ; Gao, Xuedong ; Li, HuiFei

  • Author_Institution
    Sch. of Econ. & Manage., Univ. of Sci. & Technol. Beijing, Beijing, China
  • fYear
    2009
  • fDate
    10-11 Oct. 2009
  • Firstpage
    283
  • Lastpage
    286
  • Abstract
    FP-growth algorithm recursively generates huge amounts of conditional pattern bases and conditional FP-trees when the dataset is huge. In such a case, both the memory usage and computational cost are expensive, such that, the FP-tree can not meet the memory requirement. In this work, we propose a novel parallel FP-growth algorithm, which is designed to run on the computer cluster. To avoid memory overflow, this algorithm finds all the conditional pattern bases of frequent items by the projection method without constructing an FP-tree. Hereafter, it splits the mining task into number of independent sub-tasks, executes these sub-tasks in parallel on nodes and then aggregates the results back for the final result. Our algorithm works independently at each node. As a result, it can efficiently reduce the inter-node communication cost. Experiments show that this parallel algorithm not only avoids the memory overflow but accelerate the computational speed. In addition, it achieves much better scalability than that of the FP-growth algorithm.
  • Keywords
    data mining; parallel algorithms; conditional FP-trees; conditional pattern bases; efficient parallel FP-growth algorithm; frequent pattern mining; Acceleration; Aggregates; Algorithm design and analysis; Clustering algorithms; Computational efficiency; Concurrent computing; Data mining; Data structures; Databases; Parallel algorithms; Computer cluster; FP-Growth; Parallel algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyber-Enabled Distributed Computing and Knowledge Discovery, 2009. CyberC '09. International Conference on
  • Conference_Location
    Zhangijajie
  • Print_ISBN
    978-1-4244-5218-7
  • Electronic_ISBN
    978-1-4244-5219-4
  • Type

    conf

  • DOI
    10.1109/CYBERC.2009.5342148
  • Filename
    5342148