• DocumentCode
    105766
  • Title

    TotalPLS: Local Dimension Reduction for Multicategory Microarray Data

  • Author

    Wenjie You ; Zijiang Yang ; Mingshun Yuan ; Guoli Ji

  • Author_Institution
    Dept. of Autom., Xiamen Univ., Xiamen, China
  • Volume
    44
  • Issue
    1
  • fYear
    2014
  • fDate
    Feb. 2014
  • Firstpage
    125
  • Lastpage
    138
  • Abstract
    Dimension reduction is an important topic in data mining, which is widely used in the areas of genetics, medicine, and bioinformatics. We propose a new local dimension reduction algorithm TotalPLS that operates in a unified partial least squares (PLS) framework and implement an information fusion of PLS-based feature selection and feature extraction. This paper focuses on extracting the potential structure hidden in high-dimensional multicategory microarray data, and interpreting and understanding the results provided by the potential structure information. First, we propose using PLS-based recursive feature elimination (PLSRFE) in multicategory problems. Then, we perform feature importance analysis based on PLSRFE for high-dimensional microarray data to determine the information feature (biomarkers) subset, which relates to the studied tumor subtypes problem. Finally, PLS-based supervised feature extraction is conducted on the selected specific genes subset to extract comprehensive features that best reflect the nature of classification to have a discriminating ability. The proposed algorithm is compared with several state-of-the-art methods using multiple high-dimensional multicategory microarray datasets. Our comparison is performed in terms of recognition accuracy, relevance, and redundancy. Experimental results show that the algorithm proposed by us can improve the recognition rate and computational efficiency. Furthermore, mining potential structure information improves the interpretability and understandability of recognition results. The proposed algorithm can be effectively applied to microarray data analysis for the discovery of gene coexpression and coregulation.
  • Keywords
    biology computing; data analysis; data mining; feature extraction; learning (artificial intelligence); least squares approximations; pattern classification; PLS framework; PLS-based feature selection; PLS-based recursive feature elimination; PLS-based supervised feature extraction; PLSRFE; TotalPLS; bioinformatics; classification nature; data mining; gene coexpression; gene coregulation; genetics; high-dimensional multicategory microarray data; information fusion; local dimension reduction algorithm; medicine; microarray data analysis; potential structure information; unified partial least squares framework; Accuracy; Data analysis; Data mining; Educational institutions; Feature extraction; Training; Vectors; Dimension reduction; feature extraction; feature selection; microarray data analysis; partial least squares (PLS);
  • fLanguage
    English
  • Journal_Title
    Human-Machine Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2291
  • Type

    jour

  • DOI
    10.1109/THMS.2013.2288777
  • Filename
    6672009