DocumentCode :
1446311
Title :
Toward Better Understanding of Protein Secondary Structure: Extracting Prediction Rules
Author :
Nguyen, Minh N. ; Zurada, Jacek M. ; Rajapakse, Jagath C.
Author_Institution :
Bioinf. Inst., Singapore, Singapore
Volume :
8
Issue :
3
fYear :
2011
Firstpage :
858
Lastpage :
864
Abstract :
Although numerous computational techniques have been applied to predict protein secondary structure (PSS), only limited studies have dealt with discovery of logic rules underlying the prediction itself. Such rules offer interesting links between the prediction model and the underlying biology. In addition, they enhance interpretability of PSS prediction by providing a degree of transparency to the predicting model usually regarded as a black box. In this paper, we explore the generation and use of C4.5 decision trees to extract relevant rules from PSS predictions modeled with two-stage support vector machines (TS-SVM). The proposed rules were derived on the RS126 data set of 126 nonhomologous globular proteins and on the PSIPRED data set of 1,923 protein sequences. Our approach has produced sets of comprehensible, and often interpretable, rules underlying the PSS predictions. Moreover, many of the rules seem to be strongly supported by biological evidence. Further, our approach resulted in good prediction accuracy, few and usually compact rules, and rules that are generally of higher confidence levels than those generated by other rule extraction techniques.
Keywords :
biology computing; logic programming; molecular biophysics; molecular configurations; prediction theory; proteins; support vector machines; logic rules; nonhomologous globular proteins; prediction rules; protein secondary structure; protein sequences; rule extraction technique; two-stage support vector machines; Accuracy; Biological system modeling; Biology computing; Computational biology; Decision trees; Logic; Predictive models; Proteins; Sequences; Support vector machines; C4.5 decision trees; Protein structure; multiclass SVM; rule extraction.; secondary structure prediction; support vector machines; Algorithms; Artificial Intelligence; Computational Biology; Data Mining; Databases, Protein; Decision Trees; Protein Structure, Secondary; Proteins;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2010.16
Filename :
5433806
Link To Document :
بازگشت