DocumentCode
2995283
Title
PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
Author
Xiao, Tao ; Yuan, Chunfeng ; Huang, Yihua
Author_Institution
Dept. of Comput. Sci. & Technol., Nanjing Univ., Nanjing, China
fYear
2011
fDate
9-11 Dec. 2011
Firstpage
252
Lastpage
257
Abstract
Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how to adapt SON to the MapReduce paradigm. Then we propose a parallelized SON algorithm, PSON, and implement it in Hadoop. Our study suggests that PSON can mine frequent item sets from a very large database with good performance. The experimental results show that when performing frequent sets mining, the time cost will increase almost linearly with the size of the datasets and decrease with approximately linear trend with the number of cluster nodes. Consequently, we conclude that PSON works well for solving the frequent set mining problem from massive datasets with a good performance in both scalability and speed-up.
Keywords
data mining; database management systems; distributed processing; pattern clustering; Hadoop; MapReduce; cluster nodes; datasets; frequent set mining problem; parallelized SON algorithm; transaction database; Algorithm design and analysis; Data mining; Distributed databases; Itemsets; Partitioning algorithms; Hadoop; MapReduce; frequent sets mining; parallelized SON algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures, Algorithms and Programming (PAAP), 2011 Fourth International Symposium on
Conference_Location
Tianjin
Print_ISBN
978-1-4577-1808-3
Type
conf
DOI
10.1109/PAAP.2011.38
Filename
6128512
Link To Document