DocumentCode :
2130980
Title :
Bounding and Estimating Association Rule Support from Clusters on Binary Data
Author :
Ordonez, Carlos ; Zhao, Kai ; Chen, Zhibo
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
609
Lastpage :
618
Abstract :
The theoretical relationship between association rules and machine learning techniques needs to be studied in more depth. This article studies the use of clustering as a model for association rule mining. The clustering model is exploited to bound and estimate association rule support and confidence. We first study the efficient computation of the clustering model with K-means; we show the sufficient statistics for clustering on binary data sets is the linear sum of points. We then prove item set support can be bounded and estimated from the model. Finally, we show support bounds fulfill the set downward closure property. Experiments study model accuracy and algorithm speed, paying particular attention to error behavior in support estimation. Given a sufficiently large number of clusters, the model becomes fairly accurate to approximate support. However, as the minimum support threshold decreases accuracy also decreases. The model is fairly accurate to discover a large fraction of frequent itemsets at different support levels. The model is compared against a traditional association rule algorithm to mine frequent itemsets, exhibiting better performance at low support levels. Time complexity to compute the binary cluster model is linear on data set size, whereas the dimensionality of transaction data sets has marginal impact on time.
Keywords :
computational complexity; data mining; pattern clustering; sparse matrices; statistical analysis; K-means clustering algorithm; association rule support estimation; binary data; data mining; sparse matrix; statistical analysis; time complexity; Association rules; Clustering algorithms; Computer science; Conferences; Data mining; Itemsets; Machine learning; Statistics; Taxonomy; USA Councils; association rules; bound; clustering; support;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.47
Filename :
4733985
Link To Document :
بازگشت