How good are association-rule mining algorithms?

Author

Pudi, Vikram ; Haritsa, Jayant R.

Author_Institution

Database Syst. Lab, Indian Inst. of Sci., Bangalore, India

fYear

2002

fDate

2002

Firstpage

276

Abstract

Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases

Keywords

data mining; relational databases; software performance evaluation; ARMOR; Oracle algorithm; algorithm performance improvement; association rule mining algorithms; counting process; data structures; database organizations; design parameters; frequent item sets; online algorithms; Algorithm design and analysis; Association rules; Counting circuits; Data mining; Data structures; Database systems; Design engineering; Itemsets; Partitioning algorithms; Spatial databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering, 2002. Proceedings. 18th International Conference on

Conference_Location

San Jose, CA

ISSN

1063-6382

Print_ISBN

0-7695-1531-2

Type

conf

DOI

10.1109/ICDE.2002.994730

Filename

994730