DocumentCode :
34157
Title :
Canonical Correlation Analysis on Data With Censoring and Error Information
Author :
Jianyong Sun ; Keates, Simeon
Author_Institution :
Sch. of Eng., Comput. & Appl. Math., Univ. of Abertay Dundee, Dundee, UK
Volume :
24
Issue :
12
fYear :
2013
fDate :
Dec. 2013
Firstpage :
1909
Lastpage :
1919
Abstract :
We developed a probabilistic model for canonical correlation analysis in the case when the associated datasets are incomplete. This case can arise where data entries either contain measurement errors or are censored (i.e., nonignorable missing) due to uncertainties in instrument calibration and physical limitations of devices and experimental conditions. The aim of our model is to estimate the true correlation coefficients, through eliminating the effects of measurement errors and abstracting helpful information from censored data. As exact inference is not possible for the proposed model, a modified variational Expectation-Maximization (EM) algorithm was developed. In the algorithm developed, we approximated the posteriors of the latent variables as normal distributions. In the experiment, the modified E-step approximation accuracy is first empirically demonstrated by being compared to hybrid Monte Carlo (HMC) sampling. The following experiments were carried out on synthetic datasets with different numbers of censored data and different correlation coefficient settings to compare the proposed algorithm with a maximum a posteriori (MAP) solution and a Markov Chain-EM solution. Experimental results showed that the variational EM solution compares favorably against the MAP solution, approaching the accuracy of the Markov Chain-EM, while maintaining computational simplicity. We finally applied the proposed algorithm to finding the mostly correlated properties of galaxy group with the X-ray luminosity.
Keywords :
Markov processes; astronomical spectra; astronomy computing; clusters of galaxies; data analysis; expectation-maximisation algorithm; normal distribution; variational techniques; HMC sampling; MAP solution; Markov chain-EM solution; X-ray luminosity; canonical correlation analysis; censored data; censoring; device physical limitation; error information; galaxy group; hybrid Monte Carlo sampling; instrument calibration uncertainties; latent variables; maximum a posteriori solution; measurement error; modified E-step approximation accuracy; nonignorable missing data; normal distribution; probabilistic model; variational EM algorithm; variational expectation-maximization algorithm; Approximation algorithms; Approximation methods; Correlation; Data models; Inference algorithms; Measurement errors; Probabilistic logic; Canonical correlation analysis (CCA); censored data; latent variable model; measurement errors;
fLanguage :
English
Journal_Title :
Neural Networks and Learning Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
2162-237X
Type :
jour
DOI :
10.1109/TNNLS.2013.2262949
Filename :
6557493
Link To Document :
بازگشت