DocumentCode
1398667
Title
Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features
Author
Hayat, M. ; Khan, Ajmal
Author_Institution
DCIS, Pakistan Inst. of Eng. & Appl. Sci., Islamabad, Pakistan
Volume
6
Issue
18
fYear
2012
Firstpage
3257
Lastpage
3264
Abstract
Membrane proteins are fundamental elements of a cell that play essential roles nearly in all the cellular processes. Prediction of membrane protein types using biological experiments are often complicated and time consuming. Therefore it is highly desirable to develop a robust, reliable and high-throughput silico method to predict membrane protein types. In this study, the authors have used two feature extraction strategies known as dipeptide and pseudo amino acid (PseAA) compositions for classification of membrane proteins types. In addition, a composite model is also developed by concatenating dipeptide and PseAA composition based features. Further, two feature selection methods such as neighbourhood preserving embedding and locally linear embedding (LLE) are applied to reduce the dimensionality of the composite model. The performance of these feature extraction strategies is evaluated using four different classifiers: K-nearest neighbour, probabilistic neural network (PNN), support vector machine (SVM) and grey incidence degree. The highest success rates have been observed using the LLE-based reduced features. SVM has yielded the best accuracy of 88.2% in case of jackknife test. Although in case of independent dataset test, PNN has obtained the highest accuracy of 98.4%. Performance measures other than accuracy are also used such as ´Mathew correlation coefficient´, sensitivity and precision. The authors simulated results show that the composite model has significantly discriminated the types of membrane protein and might be useful for future research and drug discovery.
Keywords
biology computing; biomembranes; cellular biophysics; feature extraction; molecular biophysics; neural nets; proteins; support vector machines; K-nearest neighbour; Mathew correlation coefficient´; PNN; PseAA composition; SVM; cellular processes; dipeptide; feature extraction; grey incidence degree; locally linear embedding; membrane protein type prediction; probabilistic neural network; pseudo amino acid; support vector machine;
fLanguage
English
Journal_Title
Communications, IET
Publisher
iet
ISSN
1751-8628
Type
jour
DOI
10.1049/iet-com.2011.0170
Filename
6412956
Link To Document