DocumentCode :
2616
Title :
High Dimensional Semiparametric Scale-Invariant Principal Component Analysis
Author :
Fang Han ; Han Liu
Author_Institution :
Dept. of Biostat., Johns Hopkins Univ., Baltimore, MD, USA
Volume :
36
Issue :
10
fYear :
2014
fDate :
Oct. 2014
Firstpage :
2016
Lastpage :
2032
Abstract :
We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.
Keywords :
Gaussian distribution; principal component analysis; COCA; copula component analysis; data contamination; estimation rates; feature selection; high dimensional semiparametric scale-invariant principal component analysis; monotone transformations; multivariate Gaussian distribution; semiparametric model; sparse PCA; Convergence; Correlation; Covariance matrices; Equations; Mathematical model; Principal component analysis; Vectors; High dimensional statistics; nonparanormal distribution; principal component analysis; robust statistics;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2014.2307886
Filename :
6747357
Link To Document :
بازگشت