• DocumentCode
    3090261
  • Title

    Developing Novel and Effective Approach for Association Rule Mining Using Progressive Sampling

  • Author

    Umarani, V. ; Punithavalli, M.

  • Author_Institution
    Dept. of Comput. Sci., Sri Ramakrishna Coll. of Arts & Sci. for Women, Coimbatore, India
  • Volume
    1
  • fYear
    2009
  • fDate
    28-30 Dec. 2009
  • Firstpage
    610
  • Lastpage
    614
  • Abstract
    A challenging task in data mining is the process of discovering association rules from a large database. Most of the existing association rule mining algorithms make repeated passes over the entire database to determine the frequent itemsets, which is likely to incur an extremely high I/O overhead. A simple but an effective way to overcome this problem is to sample the database, such that, it produces rules with highest achievable accuracy on the large database. Numerous researchers have proposed sampling approaches for faster and efficient mining of association rules. In this paper, we propose a novel and effective progressive sampling-based approach for mining association rules from a large database. Initially, the frequent patterns are extracted using Apriori algorithm from an initial sample that is selected based on the temporal characteristics and the size of the database. Using the frequent itemsets generated, the negative border of the initial sample is obtained and sorted. Subsequently, the midpoint itemset in the sorted negative border is scanned in the concrete database to check if it is frequent. Based on the support level computed for the midpoint itemset, the sample size is either progressively increased for determining an optimal sample or association rules are mined by considering it as an optimal sample. The experimental results demonstrate the efficiency of the proposed progressive sampling approach in effective mining of association rules.
  • Keywords
    data mining; sampling methods; very large databases; Apriori algorithm; association rule mining; data mining; frequent itemsets; frequent patterns; large database; progressive sampling; Application software; Art; Association rules; Clustering algorithms; Computer science; Data mining; Databases; Educational institutions; Itemsets; Sampling methods; Apriori; Association Rule Mining (ARM); Data mining; Frequent Patterns; Negative border; Progressive sampling; Sampling; Temporal;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Electrical Engineering, 2009. ICCEE '09. Second International Conference on
  • Conference_Location
    Dubai
  • Print_ISBN
    978-1-4244-5365-8
  • Electronic_ISBN
    978-0-7695-3925-6
  • Type

    conf

  • DOI
    10.1109/ICCEE.2009.211
  • Filename
    5380173