Title :
Developing Novel and Effective Approach for Association Rule Mining Using Progressive Sampling
Author :
Umarani, V. ; Punithavalli, M.
Author_Institution :
Dept. of Comput. Sci., Sri Ramakrishna Coll. of Arts & Sci. for Women, Coimbatore, India
Abstract :
A challenging task in data mining is the process of discovering association rules from a large database. Most of the existing association rule mining algorithms make repeated passes over the entire database to determine the frequent itemsets, which is likely to incur an extremely high I/O overhead. A simple but an effective way to overcome this problem is to sample the database, such that, it produces rules with highest achievable accuracy on the large database. Numerous researchers have proposed sampling approaches for faster and efficient mining of association rules. In this paper, we propose a novel and effective progressive sampling-based approach for mining association rules from a large database. Initially, the frequent patterns are extracted using Apriori algorithm from an initial sample that is selected based on the temporal characteristics and the size of the database. Using the frequent itemsets generated, the negative border of the initial sample is obtained and sorted. Subsequently, the midpoint itemset in the sorted negative border is scanned in the concrete database to check if it is frequent. Based on the support level computed for the midpoint itemset, the sample size is either progressively increased for determining an optimal sample or association rules are mined by considering it as an optimal sample. The experimental results demonstrate the efficiency of the proposed progressive sampling approach in effective mining of association rules.
Keywords :
data mining; sampling methods; very large databases; Apriori algorithm; association rule mining; data mining; frequent itemsets; frequent patterns; large database; progressive sampling; Application software; Art; Association rules; Clustering algorithms; Computer science; Data mining; Databases; Educational institutions; Itemsets; Sampling methods; Apriori; Association Rule Mining (ARM); Data mining; Frequent Patterns; Negative border; Progressive sampling; Sampling; Temporal;
Conference_Titel :
Computer and Electrical Engineering, 2009. ICCEE '09. Second International Conference on
Conference_Location :
Dubai
Print_ISBN :
978-1-4244-5365-8
Electronic_ISBN :
978-0-7695-3925-6
DOI :
10.1109/ICCEE.2009.211