• DocumentCode
    3459906
  • Title

    Improving Prediction of the Contact Numbers of Residues in Proteins from Primary Sequences

  • Author

    Dong, Qiwen ; Zhou, Shuigeng ; Guan, Jihong

  • Author_Institution
    Shanghai Key Lab. of Intell. Inf. Process., Fudan Univ., Shanghai, China
  • fYear
    2009
  • fDate
    3-5 Aug. 2009
  • Firstpage
    251
  • Lastpage
    254
  • Abstract
    Contact number is one kinds of one-dimensional features of proteins. Knowing the number of residue contacts in a protein is crucial to derive constraints useful in protein structure prediction. In this study, we evaluate and compare several methods and different features for contact number prediction. The experiments are performed on a nonredundant dataset containing 1109 proteins. The contact number prediction is formulated as a multi-class classification problem. Three-fold cross validation is used to get the performance of various methods with different combinations of features as input. The experimental results show that the profile feature containing evolutionary information of proteins can achieve better performance than simple amino acid sequences. Further performance improvement is achieved by including the predicted secondary structure and relative solvent accessibility as additional features. In all experiments, each tested method can improve the performance by more than 10 percent in comparison with the base-line method. The best Q score for two-class classification is 79.7%, which is higher than the best results reported in the literature by 2 percent. The results obtained here can provide valuable information for protein structure reconstruction, model quality assessment, etc.
  • Keywords
    bioinformatics; evolution (biological); molecular biophysics; pattern classification; proteins; support vector machines; Q score; amino acid sequence; base-line method; model quality assessment; multiclass classification problem; nonredundant dataset; protein evolutionary information; protein structure prediction; residue contact number prediction; secondary structure; support vector machine; three-fold cross validation; two-class classification; Amino acids; Atomic measurements; Bioinformatics; Computer science; Entropy; Predictive models; Proteins; Sequences; Solvents; Support vector machines; Conditional random field; Contact number prediction; Maximum entropy model; Support vector machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS '09. International Joint Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3739-9
  • Type

    conf

  • DOI
    10.1109/IJCBS.2009.39
  • Filename
    5260676