DocumentCode :
1305449
Title :
A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases
Author :
Yen, Luh ; Saerens, Marco ; Fouss, François
Author_Institution :
Machine Learning Group (MLG), Univ. Catholique de Louvain (UCL), Louvain-La-Neuve, Belgium
Volume :
23
Issue :
4
fYear :
2011
fDate :
4/1/2011 12:00:00 AM
Firstpage :
481
Lastpage :
495
Abstract :
This work introduces a link analysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a random walk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain, is extracted by stochastic complementation. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion map subspace and visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are defined, and to multiple correspondence analysis when the database takes the form of a simple star-schema. On the other hand, a kernel version of the diffusion map distance, generalizing the basic diffusion map distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several data sets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.
Keywords :
Markov processes; data analysis; data mining; pattern clustering; relational databases; Markov chain; correspondence analysis; diffusion map distance; diffusion map subspace; link analysis procedure; random walk model; relational database mining; relationships extraction; spectral clustering; stochastic complementation extraction; Graph mining; correspondence analysis; diffusion map; dimensionality reduction; kernel on a graph; link analysis; statistical relational learning.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.142
Filename :
5557876
Link To Document :
بازگشت