Title :
Toward multidatabase mining: identifying relevant databases
Author :
Liu, Huan ; Lu, Hongjun ; Yao, Jun
Author_Institution :
Dept. of Comput. Sci., Arizona State Univ., Tempe, AZ, USA
Abstract :
Various tools and systems for knowledge discovery and data mining have been developed and are available for applications. However, when there are many databases, an immediate question is where one should start mining. It is not true that data mining is better the more databases there are. It is only true when the databases involved are relevant to the task at hand. By breaking away from the conventional data mining assumption that many databases should be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with the objective of finding patterns or regularities of certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure´s performance and to exemplify its application
Keywords :
data mining; distributed databases; data mining; knowledge discovery; multidatabase mining; pattern finding; regularity finding; relevant database identification; Data mining; Database systems; Pressing; Statistics; Surges;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on