Title :
IOMRA - A High Efficiency Frequent Itemset Mining Algorithm Based on the MapReduce Computation Model
Author :
Sheng-Hui Liu ; Shi-Jia Liu ; Shi-Xuan Chen ; Kun-Ming Yu
Author_Institution :
Sch. of Software, Harbin Univ. of Sci. & Technol., Harbin, China
Abstract :
The goal of Frequent Item set Mining (FIM) is to find the biggest number of frequently used subsets from a big transaction database. In previous studies, using the advantage of multicore computing, the execution time of an Apriori algorithm was sharply decreased: when the size of a data set was more than TBs and a single host had been unable to afford a large number of operations by using a number of computers connected into a super computer to speed up execution as being the obvious solution. Some parallel Apriori algorithms, based on the MapReduce framework, have been proposed. However, with these algorithms, memory would be quickly exhausted and communication cost would rise sharply. This would greatly reduce execution efficiency. In this paper, we present an improved reformative Apriori algorithm that uses the length of each transaction to determine the size of the maximum merge candidate item sets. By reducing the production of low frequency item sets in Map function, memory exhaustion is ameliorated, greatly improving execution efficiency.
Keywords :
data mining; parallel algorithms; FIM; IOMRA algorithm; Map function; MapReduce computation model; high efficiency frequent itemset mining algorithm; memory exhaustion; multicore computing; parallel Apriori algorithm; transaction database; Algorithm design and analysis; Computers; Data mining; Educational institutions; Itemsets; Memory management; Parallel processing; Aprior; Frequent Itemset Mining; Hadoop; MapReduce;
Conference_Titel :
Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-7980-6
DOI :
10.1109/CSE.2014.247