Title :
Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System
Author :
Zhou, Jiayi ; Yu, Kun-Ming
Author_Institution :
Inst. of Eng. Sci., Chung Hua Univ., Hsinchu
Abstract :
Mining frequent patterns from transaction-oriented database is an important problem. Frequent patterns are essential for generate association rules, time series, etc. Most of frequent patterns mining algorithm can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent times, many methods have been proposed for solving this problem based on FP-tree, because this approach can reduce the number of database scan. However, even for pattern growth methods, the execution time grows rapidly when the database size is getting large and the given support is small. Therefore, parallel-distributed computing is a good strategy to solve this problem. Some parallel algorithms have been proposed, but the execution time is costly when the database size is large. In this paper, we proposed an efficient parallel and distributed mining algorithm-Balanced Tidset Parallel FP-tree (BTP-tree) algorithm on grid computing system. Grid system is a heterogeneous computing environment, our proposed method can balance the loading according to the tree depth and width. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly select transactions instead of scanning database. BTP-tree, TPFP-tree and PFP-tree were implemented and the datasets generated by IBM Quest Synthetic Data Generator are used to verify the performance of BTP-tree. The experimental results show that BTP-tree can reduce the execution time significantly and has better loading balance capability than TPFP-tree and PFP-tree.
Keywords :
data mining; grid computing; resource allocation; Tidset-based parallel FP-tree algorithm; association rules; balanced Tidset parallel FP-tree algorithm; distributed mining algorithm; frequent pattern mining algorithm; grid computing system; heterogeneous computing environment; load balancing; parallel-distributed computing; time series; transaction identification set; transaction-oriented database; Association rules; Computer science; Concurrent computing; Data engineering; Data mining; Data structures; Grid computing; Knowledge engineering; Pervasive computing; Transaction databases; association rules; frequent pattern mining; grid computing; load balancing; tidset;
Conference_Titel :
Semantics, Knowledge and Grid, 2008. SKG '08. Fourth International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-0-7695-3401-5
Electronic_ISBN :
978-0-7695-3401-5
DOI :
10.1109/SKG.2008.65