• DocumentCode
    1359074
  • Title

    PLS-Based Gene Selection and Identification of Tumor-Specific Genes

  • Author

    Ji, Guoli ; Yang, Zijiang ; You, Wenjie

  • Author_Institution
    Dept. of Autom., Xiamen Univ., Xiamen, China
  • Volume
    41
  • Issue
    6
  • fYear
    2011
  • Firstpage
    830
  • Lastpage
    841
  • Abstract
    In view of the characteristics of high-dimensional small sample, strong relevance, and high noise of the identification of tumor-specific genes on microarray, a novel partial least squares (PLS) based gene-selection method, which synthesizes genetic relatedness and is suitable for multicategory classification, is presented. Using the explanation difference of independent variables on dependent variable (class), we define three indicators for global gene selection, which takes into accounts the combined effects of all the genes and the correlation among the genes. Integrated with the linear kernel support vector classifier (SVC), the proposed method is tested by MIT acute myeloid leukemia/acute lymphoblastic leukemia (AML/ALL) and small round blue cell tumors (SRBCT) data sets. A subset of specific genes with small numbers and high identification are obtained. The results indicate that our proposed PLS-based method for tumor-specific genes selection is highly efficient. Compared to the literature, the selected specific genes from both two-category dataset AML/ALL and multicategory dataset SRBCT are credible. Further investigation shows that the proposed gene-selection method is robust. Overall, the proposed method can effectively solve feature-selection problem on high-dimensional small sample. At the same time, it has good performance for multicategory classification as well.
  • Keywords
    biology computing; diseases; genetics; least squares approximations; pattern classification; support vector machines; tumours; AML/ALL; MIT acute myeloid leukemia; PLS based gene-selection method; PLS-based gene selection; PLS-based method; SRBCT data sets; SVC; acute lymphoblastic leukemia; feature-selection problem; genetic relatedness; global gene selection; high-dimensional small sample; identification; linear kernel support vector classifier; microarray; multicategory classification; multicategory dataset SRBCT; partial least squares; small round blue cell tumors data sets; tumor-specific genes selection; Gene expression; Least squares methods; Mathematical model; Tumors; Gene selection; high-dimensional small samples; partial least squares (PLS); tumor-specific gene;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1094-6977
  • Type

    jour

  • DOI
    10.1109/TSMCC.2010.2078503
  • Filename
    5607317