• DocumentCode
    2028526
  • Title

    Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System

  • Author

    Zhou, Jiayi ; Yu, Kun-Ming

  • Author_Institution
    Inst. of Eng. Sci., Chung Hua Univ., Hsinchu
  • fYear
    2008
  • fDate
    3-5 Dec. 2008
  • Firstpage
    103
  • Lastpage
    108
  • Abstract
    Mining frequent patterns from transaction-oriented database is an important problem. Frequent patterns are essential for generate association rules, time series, etc. Most of frequent patterns mining algorithm can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent times, many methods have been proposed for solving this problem based on FP-tree, because this approach can reduce the number of database scan. However, even for pattern growth methods, the execution time grows rapidly when the database size is getting large and the given support is small. Therefore, parallel-distributed computing is a good strategy to solve this problem. Some parallel algorithms have been proposed, but the execution time is costly when the database size is large. In this paper, we proposed an efficient parallel and distributed mining algorithm-Balanced Tidset Parallel FP-tree (BTP-tree) algorithm on grid computing system. Grid system is a heterogeneous computing environment, our proposed method can balance the loading according to the tree depth and width. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly select transactions instead of scanning database. BTP-tree, TPFP-tree and PFP-tree were implemented and the datasets generated by IBM Quest Synthetic Data Generator are used to verify the performance of BTP-tree. The experimental results show that BTP-tree can reduce the execution time significantly and has better loading balance capability than TPFP-tree and PFP-tree.
  • Keywords
    data mining; grid computing; resource allocation; Tidset-based parallel FP-tree algorithm; association rules; balanced Tidset parallel FP-tree algorithm; distributed mining algorithm; frequent pattern mining algorithm; grid computing system; heterogeneous computing environment; load balancing; parallel-distributed computing; time series; transaction identification set; transaction-oriented database; Association rules; Computer science; Concurrent computing; Data engineering; Data mining; Data structures; Grid computing; Knowledge engineering; Pervasive computing; Transaction databases; association rules; frequent pattern mining; grid computing; load balancing; tidset;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantics, Knowledge and Grid, 2008. SKG '08. Fourth International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-0-7695-3401-5
  • Electronic_ISBN
    978-0-7695-3401-5
  • Type

    conf

  • DOI
    10.1109/SKG.2008.65
  • Filename
    4725902