• DocumentCode
    1398667
  • Title

    Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features

  • Author

    Hayat, M. ; Khan, Ajmal

  • Author_Institution
    DCIS, Pakistan Inst. of Eng. & Appl. Sci., Islamabad, Pakistan
  • Volume
    6
  • Issue
    18
  • fYear
    2012
  • Firstpage
    3257
  • Lastpage
    3264
  • Abstract
    Membrane proteins are fundamental elements of a cell that play essential roles nearly in all the cellular processes. Prediction of membrane protein types using biological experiments are often complicated and time consuming. Therefore it is highly desirable to develop a robust, reliable and high-throughput silico method to predict membrane protein types. In this study, the authors have used two feature extraction strategies known as dipeptide and pseudo amino acid (PseAA) compositions for classification of membrane proteins types. In addition, a composite model is also developed by concatenating dipeptide and PseAA composition based features. Further, two feature selection methods such as neighbourhood preserving embedding and locally linear embedding (LLE) are applied to reduce the dimensionality of the composite model. The performance of these feature extraction strategies is evaluated using four different classifiers: K-nearest neighbour, probabilistic neural network (PNN), support vector machine (SVM) and grey incidence degree. The highest success rates have been observed using the LLE-based reduced features. SVM has yielded the best accuracy of 88.2% in case of jackknife test. Although in case of independent dataset test, PNN has obtained the highest accuracy of 98.4%. Performance measures other than accuracy are also used such as ´Mathew correlation coefficient´, sensitivity and precision. The authors simulated results show that the composite model has significantly discriminated the types of membrane protein and might be useful for future research and drug discovery.
  • Keywords
    biology computing; biomembranes; cellular biophysics; feature extraction; molecular biophysics; neural nets; proteins; support vector machines; K-nearest neighbour; Mathew correlation coefficient´; PNN; PseAA composition; SVM; cellular processes; dipeptide; feature extraction; grey incidence degree; locally linear embedding; membrane protein type prediction; probabilistic neural network; pseudo amino acid; support vector machine;
  • fLanguage
    English
  • Journal_Title
    Communications, IET
  • Publisher
    iet
  • ISSN
    1751-8628
  • Type

    jour

  • DOI
    10.1049/iet-com.2011.0170
  • Filename
    6412956