DocumentCode :
1345696
Title :
Scalable algorithms for association mining
Author :
Zaki, Mohammed J.
Author_Institution :
Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
Volume :
12
Issue :
3
fYear :
2000
Firstpage :
372
Lastpage :
390
Abstract :
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. We present efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining improvements of more than an order of magnitude for our test databases
Keywords :
associative processing; data mining; equivalence classes; search problems; very large databases; association mining; association rule discovery; compute intensive phase; conditional implication rules; data mining; database layout schemes; fast discovery; frequent itemsets; knowledge discovery; lattice traversal techniques; long frequent itemsets; scalable algorithms; small independent chunks; structural properties; sublattices; subset lattice search space; test databases; traversal techniques; Books; Costs; Data mining; Helium; Itemsets; Lattices; Marketing and sales; Space technology; Spatial databases; Testing;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/69.846291
Filename :
846291
Link To Document :
بازگشت