• DocumentCode
    379132
  • Title

    How good are association-rule mining algorithms?

  • Author

    Pudi, Vikram ; Haritsa, Jayant R.

  • Author_Institution
    Database Syst. Lab, Indian Inst. of Sci., Bangalore, India
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    276
  • Abstract
    Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining process. Clearly, any practical algorithm has to do at least this much work in order to generate mining rules. While the notion of the Oracle is conceptually simple, its construction is not equally straightforward. In particular, it is critically dependent on the choice of data structures and database organizations used during the counting process. We present a carefully engineered implementation of Oracle that makes the best choices for these design parameters at each stage of the counting process. We also present anew mining algorithm, called ARMOR (Association Rule Mining based on ORacle), whose structure is derived by making minimal changes to Oracle, and is guaranteed to complete in two passes over the database. This is in marked contrast to the earlier approaches which designed new algorithms by trying to address the limitations of previous online algorithms. Although ARMOR is derived from Oracle, it shares the positive features of a variety of previous algorithms such as PARTITION, CARMA, AS-CPA, VIPER and DELTA. Our empirical study shows that ARMOR consistently performs within a factor of two of Oracle, over both real and synthetic databases
  • Keywords
    data mining; relational databases; software performance evaluation; ARMOR; Oracle algorithm; algorithm performance improvement; association rule mining algorithms; counting process; data structures; database organizations; design parameters; frequent item sets; online algorithms; Algorithm design and analysis; Association rules; Counting circuits; Data mining; Data structures; Database systems; Design engineering; Itemsets; Partitioning algorithms; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2002. Proceedings. 18th International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1531-2
  • Type

    conf

  • DOI
    10.1109/ICDE.2002.994730
  • Filename
    994730