• DocumentCode
    1785225
  • Title

    Sparse gene expression data analysis based on truncated power

  • Author

    Ningmin Shen ; Jing Li ; Cheng Jin ; Peiyun Zhou

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Nanjing Univ. of Aeronaut. & Astronaut., Nanjing, China
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    39
  • Lastpage
    44
  • Abstract
    Cluster analysis has become a popular method for gene expression data, which can be used for the diagnosis of diseases accurately and rapidly through the class label. However, more attributes and less samples of gene expression data will produce a mass of redundant or disturbed information, resulting in the decline of the accuracy of the direct clustering acting on high dimensional data. Principal Component Analysis (PCA) is a classical method for dimension reduction which can transform high dimension data into low space. The shortcoming of PCA is the lack of strong interpretation because the loadings have no characteristic of sparsity. In this paper, a sparse PCA method based on Truncated Power, which can minimizes the cardinality of loadings as well as maximizes the percentage explained variances of principal components (PCs), was applied into the feature extraction method for gene expression, then the sparse PCs was fed into K-means process for clustering. Finally, the experimental results on three typical gene datasets verify that the sparse gene data can improve the efficiency and accuracy on clustering analysis.
  • Keywords
    bioinformatics; data analysis; feature extraction; genetics; pattern clustering; principal component analysis; K-means process; PCA; cluster analysis; dimension reduction; direct clustering; disease diagnosis; disturbed information; feature extraction method; gene datasets; high-dimension data transform; principal component analysis; redundant information; sparse gene expression data analysis; truncated power; Cancer; Colon; Correlation; Feature extraction; Gene expression; Loading; Principal component analysis; Gene expression data; Truncated Power; feature extraction; sparse principal component analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999385
  • Filename
    6999385