Title of article :
Predicting the state of cysteines based on sequence information
Author/Authors :
Guang، نويسنده , , Xuanmin and Guo، نويسنده , , Yanzhi and Xiao، نويسنده , , Jiamin and Wang، نويسنده , , Xia and Sun، نويسنده , , Jing-Wei Xiong، نويسنده , , Wenjia and Li، نويسنده , , Menglong، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Abstract :
A three-stage support vector machine (SVM) was constructed to predict the state of cysteines by fusing sequence information, evolution information and annotation information of protein sequences. The first and second stages were for predicting whether the protein sequences contain disulfide bonds and whether all of the cysteines are involved in disulfide bonds. In the last stage, one SVM was constructed for predicting which cysteines are involved in disulfide bonds, among all these cysteines in proteins. The three SVMs give a good performance and the overall prediction accuracy are 90.05%, 96.36% and 80.00%, respectively, which indicates that the features selected in this work are effective for predicting the state of cysteines. In addition, current methods only paid too much attention to the prediction performance and never showed us how much important the roles of these features played in the prediction. As a result a feature importance measurement designated as F-score function was used to evaluate these features. The result shows that among these protein descriptors; evolution information is the most important feature for representing the disulfide-containing proteins. The prediction software and data sets used in this article are freely available at http://cic.scu.edu.cn/bioinformatics/Predict_Cys.zip.
Keywords :
Evolution information , Annotation information , Support vector machine , F-score function.
Journal title :
Journal of Theoretical Biology
Journal title :
Journal of Theoretical Biology