Title :
Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework
Author :
Yen-Hui Liang ; Shiow-Yang Wu
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Dong Hwa Univ., Hualien, Taiwan
Abstract :
Frequent item set mining(FIM) is an important research topic because it is widely applied in real world to find the frequent item sets and to mine human behavior patterns. FIM process is both memory and compute-intensive. As data grows exponentially every day, the problems of efficiency and scalability become more severe. In this paper, we propose a new distributed FIM algorithm, called Sequence-Growth, and implement it on MapReduce framework. Our algorithm applies the idea of lexicographical order to construct a tree, called "lexicographical sequence tree", that allows us to find all frequent item sets without exhaustive search over the transaction databases. In addition, the breadth-wide support-based pruning strategy is also an important factor to contribute the efficiency and scalability of our algorithm. To test the performances of our algorithm, we conduct varied aspects of experiments on MapReduce framework with large datasets. The results show the good efficiency and scalability of Sequence-Growth especially to deal with big data and long item sets. Our algorithm also proposes a new mining methodology which can be easily modified for sequential pattern mining, trajectory pattern mining and other associate rule mining algorithms. We believe that it should have a valuable contribution in the future development of association rule mining algorithms for big data.
Keywords :
Big Data; data mining; parallel processing; Big Data; FIM; MapReduce framework; Sequence-Growth; association rule mining algorithm; frequent itemset mining algorithm; lexicographical order; lexicographical sequence tree; Algorithm design and analysis; Big data; Data mining; Itemsets; Silicon; Trajectory; Big Data; Efficiency; Frequent Pattern Mining; MapReduce; Scalability;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.65