Title :
Frequent Itemset Mining on Large-Scale Shared Memory Machines
Author :
Zhang, Yan ; Zhang, Fan ; Bakos, Jason
Author_Institution :
Dept. of CSE, Univ. of South Carolina, Columbia, SC, USA
Abstract :
Frequent Item set Mining (FIM) is a data mining task that is used to find frequently-occurring subsets amongst a database of item sets. FIM is a non-numerical data intensive computation and is frequently used in machine learning and computational biology applications. The development of increasingly efficient FIM algorithms is an active field, but exposing and exploiting parallelism is not often emphasized in the development of new FIM algorithms. In this paper, we explore parallel implementations of two FIM algorithms, Apriori and Eclat, each using three different representations: vertical transaction id set, vertical bit vector, and diffset. We implemented these algorithms using OpenMP and evaluated their resultant scalability on the 4096-core Intel Nehalem-EX SGI Altix shared-memory machine Teragrid "Blacklight" using 16 processors (one blade) to 256 processors (16 blades) and reported our results. We found that, while scalability generally depends on the input data, Apriori is only scalable when used with diffset. On the other side, Eclat is generally scalable but achieves its best scalability with diffset.
Keywords :
data mining; message passing; shared memory systems; Apriori; Eclat; Intel Nehalem-EX SGI Altix shared-memory machine; OpenMP; Teragrid Blacklight; computational biology application; data mining; frequent itemset mining; large-scale shared memory machine; machine learning; nonnumerical data intensive computation; parallel implementation; vertical bit vector; vertical transaction set; Algorithm design and analysis; Blades; Data mining; Instruction sets; Itemsets; Machine learning algorithms; Scalability; Apriori; Eclat; Frquent Itemset Mining; parallel; shared memory;
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
DOI :
10.1109/CLUSTER.2011.69