DocumentCode
1434597
Title
Identifying Protein-Kinase-Specific Phosphorylation Sites Based on the Bagging–AdaBoost Ensemble Approach
Author
Yu, Zhiwen ; Deng, Zhongkai ; Wong, Hau-San ; Tan, Lirong
Author_Institution
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou, China
Volume
9
Issue
2
fYear
2010
fDate
6/1/2010 12:00:00 AM
Firstpage
132
Lastpage
143
Abstract
Protein phosphorylation is an important step in many biological processes, such as cell cycles, membrane transport, apoptosis, etc. In order to obtain more useful information about protein phosphorylation, it is necessary to develop a robust, stable, and accurate approach to predict phosphorylation sites. Although there exist a number of approaches to predict phosphorylation sites, such as those based on neural network and the support vector machine, they only use a single classifier. In general, the prediction results obtained by these approaches are not very stable and robust. In this paper, we design a new classifier ensemble approach called Bagging-AdaBoost ensemble (BAE) for the prediction of eukaryotic protein phosphorylation sites, which incorporates the bagging technique and the AdaBoost technique into the classifier framework to improve the accuracy, stability, and robustness of the final result. To our knowledge, this is the first time in which a combined bagging and boosting ensemble approach is applied to predict phosphorylation sites. Our prediction system based on BAE focuses on six kinase families: CDK, CK2, MAPK, PKA, PKC, and SRC. BAE achieves good performance in these six families, and the accuracies of the prediction system for these families are 0.8634, 0.8721, 0.8542, 0.8537, 0.8052, and 0.7432, respectively.
Keywords
bioinformatics; molecular biophysics; proteins; Bagging-AdaBoost ensemble approach; CDK kinase; CK2 kinase; MAPK kinase; PKA kinase; PKC kinase; SRC kinase; apoptosis; cell cycle; eukaryotic protein; membrane transport; neural network; protein kinase specific phosphorylation site; support vector machine; Adaptive boosting (AdaBoost); bagging; ensemble; kinase family; phosphorylation sites; prediction; Algorithms; Amino Acid Sequence; Bayes Theorem; Catalytic Domain; Computational Biology; Databases, Protein; Models, Statistical; Phosphorylation; Principal Component Analysis; Protein Kinases; Reproducibility of Results; Sequence Analysis, Protein; Software;
fLanguage
English
Journal_Title
NanoBioscience, IEEE Transactions on
Publisher
ieee
ISSN
1536-1241
Type
jour
DOI
10.1109/TNB.2010.2043682
Filename
5427103
Link To Document