• DocumentCode
    3609101
  • Title

    PETs: A Stable and Accurate Predictor of Protein-Protein Interacting Sites Based on Extremely-Randomized Trees

  • Author

    Bin Xia ; Hong Zhang ; Qianmu Li ; Tao Li

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Nanjing Univ. of Sci. & Technol., Nanjing, China
  • Volume
    14
  • Issue
    8
  • fYear
    2015
  • Firstpage
    882
  • Lastpage
    893
  • Abstract
    Protein-protein interaction (PPI) plays crucial roles in the performance of various biological processes. A variety of methods are dedicated to identify whether proteins have interaction residues, but it is often more crucial to recognize each amino acid. In practical applications, the stability of a prediction model is as important as its accuracy. However, random sampling, which is widely used in previous prediction models, often brings large difference between each training model. In this paper, a Predictor of protein-protein interaction sites based on Extremely-randomized Trees (PETs) is proposed to improve the prediction accuracy while maintaining the prediction stability. In PETs, a cluster-based sampling strategy is proposed to ensure the model stability: first, the training dataset is divided into subsets using specific features; second, the subsets are clustered using K-means; and finally the samples are selected from each cluster. Using the proposed sampling strategy, samples which have different types of significant features could be selected independently from different clusters. The evaluation shows that PETs is able to achieve better accuracy while maintaining a good stability. The source code and toolkit are available at https://github.com/BinXia/PETs.
  • Keywords
    biology computing; feature selection; molecular biophysics; molecular clusters; molecular configurations; proteins; sampling methods; trees (mathematics); K-means clustering; PETs; amino acid; biological processes; cluster-based sampling strategy; extremely-randomized trees; feature selection; interaction residues; prediction stability; predictor-of-protein-protein interacting sites; random sampling; training dataset; Amino acids; Feature extraction; Positron emission tomography; Proteins; Solvents; Stability analysis; Training; ETs; PETs; sampling strategy; stability;
  • fLanguage
    English
  • Journal_Title
    NanoBioscience, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1536-1241
  • Type

    jour

  • DOI
    10.1109/TNB.2015.2491303
  • Filename
    7308048