DocumentCode :
2054353
Title :
Frequent Itemset Mining on Large-Scale Shared Memory Machines
Author :
Zhang, Yan ; Zhang, Fan ; Bakos, Jason
Author_Institution :
Dept. of CSE, Univ. of South Carolina, Columbia, SC, USA
fYear :
2011
fDate :
26-30 Sept. 2011
Firstpage :
585
Lastpage :
589
Abstract :
Frequent Item set Mining (FIM) is a data mining task that is used to find frequently-occurring subsets amongst a database of item sets. FIM is a non-numerical data intensive computation and is frequently used in machine learning and computational biology applications. The development of increasingly efficient FIM algorithms is an active field, but exposing and exploiting parallelism is not often emphasized in the development of new FIM algorithms. In this paper, we explore parallel implementations of two FIM algorithms, Apriori and Eclat, each using three different representations: vertical transaction id set, vertical bit vector, and diffset. We implemented these algorithms using OpenMP and evaluated their resultant scalability on the 4096-core Intel Nehalem-EX SGI Altix shared-memory machine Teragrid "Blacklight" using 16 processors (one blade) to 256 processors (16 blades) and reported our results. We found that, while scalability generally depends on the input data, Apriori is only scalable when used with diffset. On the other side, Eclat is generally scalable but achieves its best scalability with diffset.
Keywords :
data mining; message passing; shared memory systems; Apriori; Eclat; Intel Nehalem-EX SGI Altix shared-memory machine; OpenMP; Teragrid Blacklight; computational biology application; data mining; frequent itemset mining; large-scale shared memory machine; machine learning; nonnumerical data intensive computation; parallel implementation; vertical bit vector; vertical transaction set; Algorithm design and analysis; Blades; Data mining; Instruction sets; Itemsets; Machine learning algorithms; Scalability; Apriori; Eclat; Frquent Itemset Mining; parallel; shared memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location :
Austin, TX
Print_ISBN :
978-1-4577-1355-2
Electronic_ISBN :
978-0-7695-4516-5
Type :
conf
DOI :
10.1109/CLUSTER.2011.69
Filename :
6061213
Link To Document :
بازگشت