DocumentCode
2028526
Title
Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System
Author
Zhou, Jiayi ; Yu, Kun-Ming
Author_Institution
Inst. of Eng. Sci., Chung Hua Univ., Hsinchu
fYear
2008
fDate
3-5 Dec. 2008
Firstpage
103
Lastpage
108
Abstract
Mining frequent patterns from transaction-oriented database is an important problem. Frequent patterns are essential for generate association rules, time series, etc. Most of frequent patterns mining algorithm can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent times, many methods have been proposed for solving this problem based on FP-tree, because this approach can reduce the number of database scan. However, even for pattern growth methods, the execution time grows rapidly when the database size is getting large and the given support is small. Therefore, parallel-distributed computing is a good strategy to solve this problem. Some parallel algorithms have been proposed, but the execution time is costly when the database size is large. In this paper, we proposed an efficient parallel and distributed mining algorithm-Balanced Tidset Parallel FP-tree (BTP-tree) algorithm on grid computing system. Grid system is a heterogeneous computing environment, our proposed method can balance the loading according to the tree depth and width. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly select transactions instead of scanning database. BTP-tree, TPFP-tree and PFP-tree were implemented and the datasets generated by IBM Quest Synthetic Data Generator are used to verify the performance of BTP-tree. The experimental results show that BTP-tree can reduce the execution time significantly and has better loading balance capability than TPFP-tree and PFP-tree.
Keywords
data mining; grid computing; resource allocation; Tidset-based parallel FP-tree algorithm; association rules; balanced Tidset parallel FP-tree algorithm; distributed mining algorithm; frequent pattern mining algorithm; grid computing system; heterogeneous computing environment; load balancing; parallel-distributed computing; time series; transaction identification set; transaction-oriented database; Association rules; Computer science; Concurrent computing; Data engineering; Data mining; Data structures; Grid computing; Knowledge engineering; Pervasive computing; Transaction databases; association rules; frequent pattern mining; grid computing; load balancing; tidset;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantics, Knowledge and Grid, 2008. SKG '08. Fourth International Conference on
Conference_Location
Beijing
Print_ISBN
978-0-7695-3401-5
Electronic_ISBN
978-0-7695-3401-5
Type
conf
DOI
10.1109/SKG.2008.65
Filename
4725902
Link To Document