DocumentCode :
2335272
Title :
Interestingness preprocessing
Author :
Sahar, Sigal
Author_Institution :
Tel Aviv Univ., Israel
fYear :
2001
fDate :
2001
Firstpage :
489
Lastpage :
496
Abstract :
As the size of databases increases, the number of rules mined from them also increases, often to an extent that overwhelms users. To address this problem, an important part of the knowledge discovery in databases (KDD) process is dedicated to determining which of these patterns is interesting. In this paper, we define the interestingness pre-processing (IPP) step and introduce a new framework for interestingness analysis. In a similar fashion to data pre-processing, this pre-processing should always be applied prior to interestingness processing. A strict requirement, and the biggest challenge, in defining IPP techniques is that the pre-processing does not eliminate any potentially interesting patterns. That is, the pre-processing methods must be domain-, task- and user-independent. This property differentiates the pre-processing methods from existing interestingness criteria and, since they can be applied automatically, makes them very useful. This generic nature also makes them rare: pre-processing methods are very challenging to define. We define the first two IPP techniques (overfitting and transition) and present empirical results of applying them to six databases. The results indicate that the IPP step is very powerful: in most cases, an average of half the rules mined were eliminated by the application of the two IPP techniques. These results are particularly significant since no user interaction is required to achieve them
Keywords :
data mining; importance sampling; very large databases; database size; domain-independent methods; interestingness analysis framework; interestingness pre-processing; knowledge discovery; overfitting; potentially interesting patterns; rule mining; task-independent methods; transition; user-independent methods; Association rules; Data mining; Databases; Humans;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
Type :
conf
DOI :
10.1109/ICDM.2001.989556
Filename :
989556
Link To Document :
بازگشت