Title of article :
Predicting the protein solubility by integrating chaos games representation and entropy in information theory
Author/Authors :
Xiaohui، نويسنده , , Niu and Feng، نويسنده , , Shi and Xuehai، نويسنده , , Hu and Jingbo، نويسنده , , Xia and Nana، نويسنده , , Li، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Abstract :
Protein solubility is a prerequisite for many structural, functional studies. Predicting the propensity of a protein to be soluble or to form inclusion body is a challenging and crucial problem. In order to formulate the protein samples which can reflect the intrinsic correlation with protein solubility, triangle, quadrangle and 12-vertex polygon CGR, the concept of entropy in information theory, together with amino acid and dipeptide compositions are applied based on a different mode of pseudo amino acid composition (PseAAC). The mathematical expressions involving with seven CGR methods and amino acid, dipeptide compositions with their corresponding entropies are evaluated with 10-fold cross validation and re-substitution test. The numerical results confirm that the introduction of the entropy can significantly improve the performance of the classifiers. Triangle CGR method surpass the two other CGR methods in classifier construction. It can provide complementary sequence-order information on the basis of dipeptide composition. The optimal mathematical expression is dipeptide composition, triangle CGR and their entropies. With the 2-level triangle polygon CGR + dipeptide composition together with their corresponding entropies as the mathematical feature, the classifier achieved the best accuracy 88.45% and MCC achieved 0.7588 in 10-fold cross validation test. In the re-substitution test, the 3-level triangle polygon CGR, dipeptide composition and their entropies perform best, its accuracy was 92.38%, MCC achieved 0.8387.
Keywords :
Protein solubility , Pseudo amino acid composition , Entropy in information theory , Support Vector Machine , Chaos game representation
Journal title :
Expert Systems with Applications
Journal title :
Expert Systems with Applications