• DocumentCode
    2616
  • Title

    High Dimensional Semiparametric Scale-Invariant Principal Component Analysis

  • Author

    Fang Han ; Han Liu

  • Author_Institution
    Dept. of Biostat., Johns Hopkins Univ., Baltimore, MD, USA
  • Volume
    36
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    2016
  • Lastpage
    2032
  • Abstract
    We propose a new high dimensional semiparametric principal component analysis (PCA) method, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. COCA improves upon PCA and sparse PCA in three aspects: (i) It is robust to modeling assumptions; (ii) It is robust to outliers and data contamination; (iii) It is scale-invariant and yields more interpretable results. We prove that the COCA estimators obtain fast estimation rates and are feature selection consistent when the dimension is nearly exponentially large relative to the sample size. Careful experiments confirm that COCA outperforms sparse PCA on both synthetic and real-world data sets.
  • Keywords
    Gaussian distribution; principal component analysis; COCA; copula component analysis; data contamination; estimation rates; feature selection; high dimensional semiparametric scale-invariant principal component analysis; monotone transformations; multivariate Gaussian distribution; semiparametric model; sparse PCA; Convergence; Correlation; Covariance matrices; Equations; Mathematical model; Principal component analysis; Vectors; High dimensional statistics; nonparanormal distribution; principal component analysis; robust statistics;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2014.2307886
  • Filename
    6747357