DocumentCode :
1733864
Title :
Scalable APRIORI-Based Frequent Pattern Discovery
Author :
Chester, Sean ; Sandler, Ian ; Thomo, Alex
Author_Institution :
Univ. of Victoria, Victoria, BC, Canada
Volume :
1
fYear :
2009
Firstpage :
48
Lastpage :
55
Abstract :
Frequent pattern discovery, the task of finding sets of items that frequently occur together in a dataset, has been at the core of the field of data mining for the past sixteen years. In that time, the size of datasets has grown much faster than has the ability of existing algorithms to handle those datasets. Consequently, improvements are needed. In this paper, we take the classic algorithm for the problem, A priori, and by adding a vertical sort drastically improve its performance characteristics when processing very large datasets. We use the benchmark large dataset webdocs from the FIMI 2004 conference to contrast our performance against several state-of-the-art implementations and demonstrate both equal efficiency with lower memory usage at all support thresholds and also the ability to mine support thresholds as yet unattempted in literature. We also indicate how this work can be extended to achieve yet more impressive results.
Keywords :
data mining; data mining; frequent pattern discovery; scalable Apriori; Data engineering; Data mining; Design engineering; Frequency; Itemsets; Sorting; Technological innovation; apriori; data mining; frequent pattern discovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering, 2009. CSE '09. International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4244-5334-4
Electronic_ISBN :
978-0-7695-3823-5
Type :
conf
DOI :
10.1109/CSE.2009.51
Filename :
5283015
Link To Document :
بازگشت