• DocumentCode
    3195516
  • Title

    Dimension reduction for p53 protein recognition by using incremental partial least squares

  • Author

    Xue-Qiang Zeng ; Guo-Zheng Li

  • Author_Institution
    Comput. Center, Nanchang Univ., Nanchang, China
  • fYear
    2013
  • fDate
    18-21 Dec. 2013
  • Firstpage
    381
  • Lastpage
    385
  • Abstract
    As an important tumor suppressor protein, reactivate mutated p53 was found in many kinds of human cancers and that restoring active p53 would lead to tumor regression. In recent years, more and more data extracted from biophysical simulations, which makes the modelling of mutant p53 transcriptional activity suffers from the problems of huge amount instances and very high feature dimension. Incremental feature extraction is effective to facilitate analysis of large-scale big data. However, most current incremental feature extraction methods are not suitable for processing big data with high feature dimension. In addition, feature extraction methods should improve performance of further classification. Therefore, incremental feature extraction methods need to be more efficient and effective. Partial Least Squares (PLS) has been demonstrated to be an effective dimension reduction technique for classification. But, how to apply PLS on big data is still an open problem. In this paper, we design a highly efficient and powerful algorithm named Incremental Partial Least Squares (IPLS), which conducts a two-stage extraction process. In the first stage, the PLS target function is adapted to be incremental with updating historical mean to extract the leading projection direction. In the last stage, the other projection directions are calculated through equivalence between the PLS vectors and the Krylov sequence. We compare IPLS with some state-of-the-arts incremental feature extraction methods like Incremental Principal Component Analysis, Incremental Maximum Margin Criterion and Incremental Inter-class Scatter on real p53 proteins data. Empirical results show IPLS performs better than other methods in terms of balanced classification accuracy.
  • Keywords
    data analysis; feature extraction; least squares approximations; medical computing; pattern classification; principal component analysis; proteins; tumours; IPLS; Incremental Inter-class Scatter; Incremental Maximum Margin Criterion; Incremental Partial Least Squares; Incremental Principal Component Analysis; Krylov sequence; balanced classification accuracy; biophysical simulations; data extraction; data processing; dimension reduction technique; human cancers; incremental feature extraction; incremental partial least squares; mutated p53; p53 protein recognition; transcriptional activity; tumor regression; tumor suppressor protein; two-stage extraction process; Cancer; Data handling; Data storage systems; Feature extraction; Information management; Proteins; Vectors; Big Data; Feature Extraction; Incremental Learning; Partial Least Squares; p53 Protein;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Type

    conf

  • DOI
    10.1109/BIBM.2013.6732522
  • Filename
    6732522