DocumentCode :
3249491
Title :
Efficient progressive sampling for association rules
Author :
Parthasarathy, Srinivasan
Author_Institution :
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH, USA
fYear :
2002
fDate :
2002
Firstpage :
354
Lastpage :
361
Abstract :
In data mining, sampling has often been suggested as an effective tool to reduce the size of the dataset operated at some cost to accuracy. However this loss to accuracy is often difficult to measure and characterize since the exact nature of the learning curve (accuracy vs. sample size) is parameter and data dependent, i.e., we do not know a priori what sample size is needed to achieve a desired accuracy on a particular dataset for a particular set of parameters. In this article we propose the use of progressive sampling, to determine the required sample size for association rule mining. We first show that a naive application of progressive sampling is not very efficient for association rule mining. We then present a refinement based on equivalence classes, that seems to work extremely well in practice and is able to converge to the desired sample size very quickly and very accurately. An additional novelty of our approach is the definition of a support-sensitive, interactive measure of accuracy across progressive samples.
Keywords :
data mining; equivalence classes; fractals; association rules; data mining; dataset; equivalence classes; progressive sampling; rule mining; Association rules; Costs; Data mining; Databases; Delay; Information science; Loss measurement; Pressing; Sampling methods; Size measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1183923
Filename :
1183923
Link To Document :
بازگشت