DocumentCode :
379132
Title :
How good are association-rule mining algorithms?
Author :
Pudi, Vikram ; Haritsa, Jayant R.
Author_Institution :
Database Syst. Lab, Indian Inst. of Sci., Bangalore, India
fYear :
2002
fDate :
2002
Firstpage :
276
Abstract :
Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases
Keywords :
data mining; relational databases; software performance evaluation; ARMOR; Oracle algorithm; algorithm performance improvement; association rule mining algorithms; counting process; data structures; database organizations; design parameters; frequent item sets; online algorithms; Algorithm design and analysis; Association rules; Counting circuits; Data mining; Data structures; Database systems; Design engineering; Itemsets; Partitioning algorithms; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2002. Proceedings. 18th International Conference on
Conference_Location :
San Jose, CA
ISSN :
1063-6382
Print_ISBN :
0-7695-1531-2
Type :
conf
DOI :
10.1109/ICDE.2002.994730
Filename :
994730
Link To Document :
بازگشت