• DocumentCode
    2995283
  • Title

    PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

  • Author

    Xiao, Tao ; Yuan, Chunfeng ; Huang, Yihua

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Nanjing Univ., Nanjing, China
  • fYear
    2011
  • fDate
    9-11 Dec. 2011
  • Firstpage
    252
  • Lastpage
    257
  • Abstract
    Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how to adapt SON to the MapReduce paradigm. Then we propose a parallelized SON algorithm, PSON, and implement it in Hadoop. Our study suggests that PSON can mine frequent item sets from a very large database with good performance. The experimental results show that when performing frequent sets mining, the time cost will increase almost linearly with the size of the datasets and decrease with approximately linear trend with the number of cluster nodes. Consequently, we conclude that PSON works well for solving the frequent set mining problem from massive datasets with a good performance in both scalability and speed-up.
  • Keywords
    data mining; database management systems; distributed processing; pattern clustering; Hadoop; MapReduce; cluster nodes; datasets; frequent set mining problem; parallelized SON algorithm; transaction database; Algorithm design and analysis; Data mining; Distributed databases; Itemsets; Partitioning algorithms; Hadoop; MapReduce; frequent sets mining; parallelized SON algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Programming (PAAP), 2011 Fourth International Symposium on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-1-4577-1808-3
  • Type

    conf

  • DOI
    10.1109/PAAP.2011.38
  • Filename
    6128512