DocumentCode
2864649
Title
An algorithm for in-core frequent itemset mining on streaming data
Author
Jin, Ruoming ; Agrawal, Gagan
Author_Institution
Dept. of Comput. Sci., Kent State Univ., OH, USA
fYear
2005
fDate
27-30 Nov. 2005
Abstract
Frequent item set mining is a core data mining operation and has been extensively studied over the last decade. This paper takes a new approach for this problem and makes two major contributions. First, we present a one pass algorithm for frequent item set mining, which has deterministic bounds on the accuracy, and does not require any out-of-core summary structure. Second, because our one pass algorithm does not produce any false negatives, it can be easily extended to a two pass accurate algorithm. Our two pass algorithm is very memory efficient, and allows mining of datasets with large number of distinct items and/or very low support levels. Our detailed experimental evaluation on synthetic and real datasets shows the following. First, our one pass algorithm is very accurate in practice. Second, our algorithm requires significantly lower memory than Manku and Motwani´s one pass algorithm and the multi-pass Apriori algorithm. Our two pass algorithm outperforms Apriori and FP-tree when the number of distinct items is large and/or support levels are very low. In other cases, it is quite competitive, with possible exception of cases where the average length of frequent item sets is quite high.
Keywords
data mining; data mining; incore frequent item set mining; multipass Apriori algorithm; one pass algorithm; streaming data; Computer science; Data engineering; Data mining; Data structures; Frequency; Itemsets;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, Fifth IEEE International Conference on
ISSN
1550-4786
Print_ISBN
0-7695-2278-5
Type
conf
DOI
10.1109/ICDM.2005.21
Filename
1565681
Link To Document