Title :
Factoring tertiary classification into binary classification improves neural network for protein secondary structure prediction
Author :
Zhong, Wei ; Altun, Gulsah ; Hu, Hae-Jin ; Harrison, Rob ; Tai, Phang C. ; Pan, Yi
Author_Institution :
Dept. of Comput. Sci., Georgia State Univ., Atlanta, GA, USA
Abstract :
Protein secondary structure prediction is one of the most important problems in bioinformatics research. When the traditional tertiary classifier is used in our neural network, 72% accuracy is reached. Since the neural network might not work very well in three-class classification for certain domains, the three-class problem is reduced to six binary class problems for the first time to carry out protein secondary structure prediction. With the combination of six binary classifiers, we experiment and test several tertiary classifiers. Additionally, three new tertiary classifiers are proposed in this study: MAX_HEC, ONE_TO_ONE_MAX and ONE_TO_ONE_VOTE. ONE_TO_ONE_VOTE outperforms the six other experimental tertiary classifiers in this study. ONE_TO_ONE_VOTE tertiary classifier with PSSM encoding scheme obtains 74.02% test accuracy on RS126 dataset. To the best of our knowledge, this is the best result for RS126 dataset with the cross-validation method for neural network. The improvement of prediction accuracy indicates that decomposition of the multiclass problem into several binary class problems may be applied to other areas of computational biology in order to increase generalization power of neural networks.
Keywords :
biology computing; molecular biophysics; neural nets; proteins; binary classification; bioinformatics research; computational biology; cross-validation method; encoding scheme; neural network; position specific scoring matrix; protein secondary structure prediction; tertiary classification; Accuracy; Computational biology; Computer science; Drugs; Encoding; Matrix decomposition; Neural networks; Protein sequence; Speech recognition; Testing;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2004. CIBCB '04. Proceedings of the 2004 IEEE Symposium on
Print_ISBN :
0-7803-8728-7
DOI :
10.1109/CIBCB.2004.1393951